Thursday, 8 October 2009

Validating away SQL Injection

Dear Junior
So, if we want to ban OR 1=1 from being used as a username, we have to put that restriction into the model.
We have noticed that viewing the username as just a string does not help us much.
Integer authenticate(String username, String passwordMD5)
So, to be able to talk about usernames in a meaningful way , we introduced the class Username. This actually makes a huge difference as the authentication method is now explicitly distinguishing the username from other kind of indata.

Integer authenticate(Username username, String passwordMD5)

With username as an explicit part of the model, we now can say something like “‘ OR 1=1 is not a valid username”. The beautiful part is that this sentence both speaks immediately about a property in the conceptual mode, and about a property of the code.
The sentence is even well-formed to the degree that it can be turned into a unit test case.

public void shouldNotRegardInjectionAttackStringAsValid() {
assertFalse(new Username("' OR 1=1 --").isValid());

Not bad, if you can write a requirement as a unit test, then you are pretty well off. And, if the test is failing: the better, we have a licence to code, and we know when we are done. Not bad at all.
Implementing, adding a few related tests and going back to green will probably take us to a Username class looking roughly like this.

public class Username {
// final making it immutable
public final String username;
public Username(String username) {
this.username = username;
public boolean isValid() {
return username.matches("[a-z]+");

Hold the horses! We have now modified a class that encodes a central part of our domain at hand. So, we have actually changed the model. That is nothing to take lightly. Now we have to walk the round checking with the other stakeholders in the domain that this change is something we all can agree upon. Is it really ok that usernames should be only lowercase letters and that there must be at least one? In other words, should they look like “danbj”, “danberghjohnsson” and “ilovemylittlepony”?
In the case of usernames people are used to strange formats and restrictions, so my guess is that no one will object, but we have to check – otherwise the model turns from being a shared and agreed model to being “the developers stuff”.
Upholding the model as the ubiquitous language for talking about the system is essential for the quality of the system – it was that language that enabled us to turn a one-phrase requirement and turn it immediately into code. Further on, the quality of the system is essential for its security.
So – for the sake of security, if nothing else, work hard with that model.

Going back to the API we are building, it turns out pretty well. We have achieved that it is easy to use correctly i e, when doing the right thing there is no much doubt. At the moment, you have to create a Username object (constructor), check that it is valid (isValid method) and then pass it to the authentication.
However, we have not yet achieved that the API is hard to use incorrectly, it is not stable to erroneous use. One really weak spot is that it is easy to miss calling the isValid method, and there is nothing in the rest of the API that catches that mistake.
So, we are not finished yet.

ps Of course, what we want to do is to ensure validation, not hoping that our fellow programmers will magically do the right thing.