Saturday, September 24, 2011

Roles versus Capabilities

Fine grained capabilities are the right code-level abstraction for writing any security checks but most people do it wrong by testing for "roles" instead of the underlying capabilities that should be assigned to roles which a user may or may not have.

In Java pseudo-code roughly what I'm talking about is:

WRONG!

enum Role { CUSTOMER, DIRECTOR, CEO, PEON, ADMIN, CEO_ADMIN}

if (user.hasRole(Role.DIRECTOR) || user.hasRole(Role.CEO)) {
createNewEmployee();
} else {
// Leaving this bogus error message in because I only caught it after the fact.
throw new AuthorizationException("Only directors can create new employees");
}

versus:

GOOD!

enum Capability { CREATE_NEW_EMPLOYEE, CREATE_STACK_OF_PAPER, .... }

if (user.can(Capability.CREATE_NEW_EMPLOYEE)) {
createNewEmployee();
} else {
throw new AuthorizationException("User not authorized to create a new employee");
}

The second one is way easier to write, reason about locally, and is ultimately way more flexible. Pushing the decision into the user abstraction isolates the "what can this user do" into the user object instead of all over your code based on what the code is trying to do rather than by roles which you will find need to be fluid over time. For example, in a capability based system, it becomes possible to create brand new roles without changing code anywhere except perhaps in your user object.

Actually, it is deeper than that. Most capabilities should be "read-capability" versus "change-capability". In the capability model, you should automatically create at least two different capabilities and protect the read versus write capabilities. (Generally the write capability should also require the read capability though I'm sure there are strange cases when this is actually OK). The full CRUD model would require 4 bits per fundamental operation though if I allow you to write something, I pretty much give you most of the abilities to create and delete it. (Auditing is a good thing to have in place of course. Separate blog post.)

Roles are OK as symbolic names to represent a set of fine-grained capabilities and allowing a user to have one or more roles is a fine concept (and you just need to union the underlying capabilities to produce a denser in memory storage for a single users capabilities - binary is nice!). In many systems that span a single company, you may only need a single definition for each role but if you are writing software that is supposed to meet the needs of different companies (or different subsidiaries), each company can allow fine grained control over the symbolic roles. In some companies, a CEO_ADMIN role can create new employees and in some others they can't. The capability based approach allows these distinctions quite naturally, the role based approach with global roles doesn't. With the capability approach, your software may eventually provide the ability for a particular person like Steve Jobs to "lend" capabilities to someone else (perhaps with a timeout). Again, you won't have to change 1000 places in your code to create the lend concept, only enhance the user object.

I think I know why I see the WRONG! version of the code so often. When the product designer creates the stories for developers, they create different roles as an abstraction to think about what types of users are using the system. Designers may have to think about security as a matter of law (HIPPA) or as more of a practical customer desire. Developers latch on to this and then code to those roles and then balk later when the implementation team talking to particular customers ask for something different.

It's just so obvious that a capability based approach plus per use capabilities (which can be based on the union of symbolic roles the user has) is the right strategy, that it makes me cry when I see role checks in code.

(BTW, of course you can make both the WRONG! version and the GOOD! version of the code tighter by having a method on either the user or the Role called ensureXXX() which would do the test and throw an exception if it fails.)

Is there a cost to the capability approach the role based approach doesn't have? I don't really think so. In Java on a 32 bit machine, a single role represented by an enum will typically be implemented by a 32 bit field. That itself is 16 read/write capabilities. If you have a user object kicking around, you've probably also read in their username (at least 8 * 8 bits plus 32 bits for string length), real-name (at least 10 * 8 bits plus 32 bits for the string length), userid (64 bits), phone-number (10 * 8 bits plus 32 bits for string length), and maybe even more stuff like their birthday (32 bits), start-date (32 bits), etc. That's already something like 416 bits or 208 capabilities if you just double the amount of storage you use for this structure you've never worried about passing around before). So you went from maybe 52 bytes to 104 bytes. So what? You can almost afford to assign every form-field in your system its own read/write capability and it still won't clog up your memory however you don't want to do that because then managing the sheer number of capabilities becomes harder (it's better when each one has a sharp name and precise meaning). Do your own math. Packed bits are cheap.

(Capabilities becomes even more interesting in distributed systems when you can sign them and pass them around. Andrew S. Tanenbaum after creating the progenitor of Linux and there's lot of other great research on this topic.)

No comments:

Post a Comment