Thursday, 1 October 2009

Domain Driven Security and Making Stuff Explicit in the Model

Dear Junior

”But ’ OR 1=1 -- is not a valid username! That is just bad indata validation!”. Well, ‘ OR 1=1 -- might not look like the kind of username we had in mind, but invalid? Says who?

If we have a look at the code, the signature of the authentication method says:

Integer authenticate(String username, String passwordMD5)

Basically, in the code there is nothing saying that username is any special kind of data – it is just a string. And, as such, it can be any string – including ’ OR 1=1 --.

There might be conventions, even documented such, that a username should have certain structure – but the model represented in the code consider any string to valid to send into the method.

The Domain Driven Design take on this is that if you have more restriction in your intended model, then you should better put those restrictions in the code – explicitly.

So, let us take a small step in that direction – let us make Username an explicit part of the model. Later on we can elaborate that part of the model by making restrictions on usernames explicit, and even enforcing them. But let us not take too big a bite – for now we settle for shaping up the model.

If we think about it we can surly agree that username is a special kind of data, separate from amounts, order numbers, or phone numbers. It would simply not make sense to have a phone number “+4615210000” used as a username. This is analogous to the distinction between int and boolean. Under the hood they are both “just bits and bytes”, but we want to keep them distinct in our language so that we do not accidentally use an int as the condition in an if-statement, for example. In C many hard-to-find bugs have been caused by that specific mistake.

In static typed programming languages like Java, C# or ML, we use the type system with interface and classes to separate different kinds of data. However, if we audit the authentication code we will see that there is no representation of username on that level. The only place “username” show up is as the name of a String-typed variables and parameters. The knowledge “username is a specific kind of data with its own rules and restrictions” is not explicit in the code.

Enter class Username, which at this stage might be the simplest kind of value object.

public class Username {

public final String username; // final making it immutable

public Username(String username) { this.username = username; }

}

The important part here is of course that we now have a new type, which can be used by variables, fields, parameters, and returns to make the code explicitly talk about usernames.

The authentication method will change somewhat.

/** Authenticates a user with a given password.

* @param username

* @param passwordMD5 hash of password

* @return user id, or null if no matching account

*/

Integer authenticate(Username username, String passwordMD5)

throws SQLException {

Connection con = accountDs.getConnection();

Statement stmt = con.createStatement();

String sqlSelect = "SELECT uid FROM Accounts";

String usernameMatch = "username = '" + username.username + "'";

String passwdHashMatch = "passwdHash = '" + passwordMD5 + "'";

String sql = sqlSelect +

" WHERE " + usernameMatch +

" AND " + passwdHashMatch;

ResultSet rs = stmt.executeQuery(sql);

Integer result;

if(rs.next()) { // found account with matching password

result = rs.getInt("uid");

} else { // no matching account

result = null;

}

return result;

}

So, whoever wants to call the authentication method with a username, must first create a Username object via the constructor.

public class LoginAction {

void doit() throws SQLException {

Username username = new Username(form.username);

String passwordMD5 = form.password;

accountService.authenticate(username, passwordMD5);

}

}

Now the concept of username is explicit throughout the code, and actually talks the same language as the people working with it. In effect, we have made username a part of the ubiquitous language talking about the system.

Note that we are still not yet protected from bad usernames, that will be a later step - but at least we talk about usernames, not strings.

The distinction between username strings and usernames is subtle. This distinction might seem small, but I think it is essential – as the language form how we think. The moment the programmer start expressing herself in domain terms (creating a Username object), chances are higher that she will also question the indata parameter: Is this string really a username? Where did it come from? Has it been properly checked? No guarantee, but chances are higher.

We still have some way to cover before we have an API that is both easy to use correctly, and hard to use incorrectly – but at least we have taken a step in that direction. We still lack the constraints on usernames, and there is no enforcement at all.

However, if we can guide the programmers into thinking about the model a la DDD, and thus decrease the risk of severe application security flaws, then we have at least done something useful.

And it is usefulness that is the ambition of Domain Driven Security.

Yours

Dan

PS My colleague John Wilander just published a nice example on how they did with Swedish "person number" (roughly social security number) [Swedish]