Tuesday, 27 October 2009

Release Planning Spreadsheet

Dear Junior

From time to time I have had to start setting up a backlog, start tracking velocity, and create a release plan in parallel. This situation is a little bit of a nuisance as I prefer the "scientific approach" of letting data speak. In this case that means observing the velocity of the team for a few sprints, have a look at the backlog, and then base the release plan on that data. Inspect and adapt. Inspect velocity, inspect backlog estimates, adapt release plan.

The problem is that this is a little bit hard to do when you are asked to present a release plan before you have no, or just a few, observed velocities to deduce from. Strangely enough executives seem to be accustomed to project managers with precognitive skills, and I unfortunately lack those. So, those before me have set the expectations and "not enough data yet" seem to be considered somewhat a "weak excuse". I do not blame the executives, I blame the project managers that have set "speculative guesstimates" as the standard - easy to make, hard to fulfil.

Luckily it is often possible to stall the release-plan for some time, which gives some lee-room to gather a few velocity data, and get the backlog in decent shape.

When you have at least eight sprint velocity observations, Mike Cohn has a neat short-hand trick for releases at least four sprints into the future; I have mentioned this trick in an earlier letter. However, when having less data, I have used the analysis of the mentioned method to squeeze the maximum information out of just a few sprints of observations.

The problem is that you want to give an interval you feel confident with (e g 95% confidence) and it takes some time to gather enough data for creating a decent interval.

In short:

  • velocity from one sprint is useless, it gives you an idea about average, but nothing on the variation
  • velocity from two sprints starts giving you a decent average, but with only two data-points it is hard to judge variation and giving an interval
  • velocity from three sprints starts giving you a fair idea on variation, and you can give an interval
Of course, the calculations can be done through standard statistical computations. However, doing that number exercise is a little bit unnerving when you have lots of other things to think of.

Therefore I have put together a handy cheat-sheet, which I by the way did share with a few people who attended my session on Why Release Planning Works at the just-passed Scrum Gathering in München (Munich).

In short, you fill in how many sprints until release, and then observed velocities.

After each observation, the sheet tells you the interval of 95% confidence of how much you will cover in the remaining sprints, and how much the total sum of work will accumulate to.

It also plots out a nice graph if you care to share it with the stakeholders.

A word of advice and warning when presenting the release plan to stakeholders: I prefer to talk about what stories seem to be in the release (before the 95% interval), which are in doubt (those in the interval), and which will not make it (those beyond the interval). If you present the numbers as such (or this graph), they might start taking those numbers a little bit too serious. You definitely stand the risk that upper management might start viewing it as a productivity measure, which will destroy its usability within a few sprints.

Composing a good release is also an art in itself, and much more that just picking "most valuable". A good release should contain what Kano-analysis describes as the mandatory features, and a few linear - but also at least one "exciter". If not, people might be well satisfied with it, but not raving about it. So, the spreadsheet gives a hint of what "budget" you have for the release.

Anyway: hope you will find the spreadsheet interesting, and at some point useful.

Yours

Dan


Saturday, 24 October 2009

Indata Validation is Not Enough for SQL Injection

Dear Junior

When we do indata validation through value objects we get an application tier that is water-proof. The model describes exactly those data that we think are meaningful and can handle (“username, identifier with which the user present herself to the system; regexp [a-z]+“. Each piece of indata is validated as part of the value object constructor (public Username(String s)). For indata to pass through to application, it cannot avoid validation, as the application methods require the value object type (void authenticate(Username uname…)). What more can you ask for?

- Well, you see, we do not want usernames like “danbj”, we would prefer our real names, like “Dan Bergh Johnnsson”.

No problem, we expand the regexp with uppercase letters and space, getting a regexp like [a-zA-Z\ ]+.

- Nice, but our respected colleague Fredrik Jägare-Lilja needs a username as well.

Fair enough – we stuff Scandinavian letters and hyphen into the regexp as well, giving [a-zåäöA-ZÅÄÖ\ \-]+.

- Now there is only one person left: our highly respected Irish colleague Oliver O’Hehir.

Well, well, we are almost finished then, we only need to put the apostrophe into the regexp ending up with something along [a-zåäöA-ZÅÄÖ\ \-\’]+.

Wait, wait, wait!!! Who the h*** just logged in with username “’ OR ‘a’ is not null --“?

Sure, we might have tightened up the regexp to block out that specific attack string and any other malicious use of the format we can think of. But, it is always those we did not think of that causes the trouble.

Well, as always we can think that “SQL Injection is solved by prepared statements”, but remember that Injection Flaw is much larger than SQL Injection. The same vulnerabilities might be there when doing LDAP access, using parameters to construct file names (e g Directory Traversal), or if you have some Domain Specific Language (DSL) which you interpret. In any of these cases there might be a string that might well be fully legally formatted, but attacks the structure of how the underlying resources are used.

Over to a completely different domain: FM broadcast and music radio. In the FM radio broadcast system you must be able to shut down the transmitters from a remote site. Unfortunately, there is only one way to communicate with the transmitters – via radio. The problem was solved by defining a specific sequence of audio blips (very precise on frequencies, duration, and interval) and denoting that sequence the meaning “shut down the transmitter”.

The pioneering Swedish rap group JustD put that exact sequence as the final beat on one of their songs, without telling anyone. They must have laughed all the way home from the studio. That song has been played on Swedish radio exactly once.

The JustD track hack is a wonderful example of exploiting an Injection Flaw, there is no way to escape it “in band”. I have the same gut feeling about indata validation and SQL Injection.

No matter how we structure the indata model, there might always be some data that actually is valid indata, but causes the system to crash.

So, indata modelling and validation in all its glory: However necessary it is for upholding security, it is not sufficient.

Yours

Dan


Thursday, 22 October 2009

Util Methods does not Work

Dear Junior

When writing the validation logic for the username

public boolean isValid() {

return username.matches("[a-z]+");

}

I suddenly heard a distant screaming: “Why did you not use the util method for validation?” Ehhrr … sorry … which method? Ohhh, over there … in the se.xyz.services.util.stringutils package there is a util class StringValidationUtil with a validation method.

public class StringValidationUtil {

static public boolean logincheck(String username) {

if (!(username.length() > 0)) return false;

for(int i=0; i<username.length();i++)

if(!Character.isLowerCase(username.charAt(i)))

return false;

return true;

}

}

I am sorry, I guess I just didn’t find it.

It is strange that I did not find it, because it is actually called as part of account creation when registering a new user. Did I not look for it properly?

Well, it is also strange that I did not find the method in se.xyz.utils.security.AccountTransformUtils, because there you can find

static public boolean okNewUsername(String username) {

boolean result = true;

if (username.length() == 0) result = false;

for(int i=0; i<username.length();i++)

result = result && Character.isLowerCase(username.charAt(i));

return result;

}

That method is by the way also called, as part of the check when someone wants to change username. Did I not look properly for that either?

And of course there are some more methods in se.xyz.accmgm.AccountUtil and in the ever-present se.xyz.util.Util that all basically do the same thing.

My real-life record was a util class that contained five different implementations of checking that a string was a date on the format “YYYYmmDD” – and between those implementations, there where subtle differences when handling some strange cases. By the way, there where also three more different implementations in another slightly differently named util class as well.

So, how come this multitude of util methods? They are simply not found! And the programmer in need for validating that there are only lowercase letters in the string at hand will probably look for the needed method for ten to thirty seconds, where after she will implement it herself – after all it is not that difficult. Then, to make “my nice method helpful for everybody else” it is moved to some util class.

As a side-note, you can note that most util methods are ‘static’. To me ‘static’ in an oo-program means “homeless”. Those methods could reside equally well in any other class. And residing in some obscure hide-away package does not make them easy to find.

For a method to be used, it must be in the middle of the road where the programmer is going. That is what object-orientation is good at, the methods are hung up on the data you have in your hands, so the methods are easy to find.

But unfortunately, static util methods methods do not work that way. They simply do not work.

Yours

Dan

ps A better place to put validation is inside the corresponding value object

Wednesday, 14 October 2009

Ensuring Indata Validation

Dear Junior

Creating a username class and a validation method has taken us a fair amount towards solving SQL Injection by focusing on a domain model API that is both easy to use correctly and hard to use incorrectly. I would say that we have this far achieved to make the API easy to use.

Integer authenticate(Username username, String passwordMD5)

public class Username {

// final making it immutable

public final String username;

public Username(String username) { this.username = username; }

public boolean isValid() { return username.matches("[a-z]+"); }

}

What remains is to ensure that indata validation actually is done. I can see two choices: either putting validation inside the authentication service, or to enforce validation before the call to the authentication service.

Let us first look at putting validation inside authentication.

/** Authenticates a user with a given password.

* @throws IllegalArgumentException if username invalid

*/

Integer authenticateWithUsernameValidation(Username username, String passwordMD5)

throws IllegalArgumentException, SQLException {

if(!username.isValid())

throw new IllegalArgumentException(

"Cannot authenticate with invalid username: " + username);

...

}


This definitely hardens the interface – now there is no possibility to not validate upon authentication. However, the same trick has to be used in every service method around, including the “create new account”, the “change account username” and all those that are to come in the future. Risks are high that the small isValid-call will be missed somewhere – and one hole is all an attacker needs.

Another drawback is the rather awkward “throws IllegalArgumentException” which feels like a very late validation – should not such validation be made much earlier, preferably up in the presentation and client tiers?

An alternative is to not allow invalid usernames to be constructed at all:

@Test(expected = IllegalArgumentException.class)

public void shouldNotCreateUsernameFromInjectionAttackString() {

new Username("' OR 1=1 --");

}

This request the constructor to do the validation on the inside, responding with an exception if given an invalid username candidate.

public Username(String username)

throws IllegalArgumentException {

this.username = username;

if(!isValid())

throw new IllegalArgumentException();

}

Now we also need some way to validate from the outside without taking the pain of provoking and handling an exception, so finally there will be a static method after all:

public static boolean isValid(String username) {

return username.matches("[a-z]+");

}

Of course the old methods and constructor should be refactored to uphold the don’t-repeat-yourself (DRY) principle. Interesting enough this will lead the isValid() method to consistently return true – so I guess we can delete it from the class and inline it wherever it was used. That is, unless we for some bizarre reason want to have a method that explicitly tells the rest of the world that “this object is always valid”.

I definitely prefer this latter “strictly-validated-value-object” style before the "validating-service-methods" style. It creates an API that besides being easy to use correctly, also is hard to use incorrectly. It “guides” the client side programmer without being intrusive or obstructive about it.

In some sense, it "enforce" a behaviour upon the client side programmer. However, that does not trouble me. If someone just is nice to me, and don’t cause me trouble, I see no obstacle in letting her have her way.

Yours

Dan


Monday, 12 October 2009

Avoid Synonyms in the Ubiquoutous Language

Dear Junior

When walking the round trying to establish a new term in the ubiquitous language of a system, it is very tempting to start accepting synonyms.

Perhaps we tried to make 'username' explicit in the model. We then need to settle what a username is, and how it is checked against its validation rules. Among the programmers, we have used username. When we get over to the GUI designers they tend to talk about it as the handle. Later on we find out that the tech writers dig the term alias, and that is what they have used in the manual. It might seem tempting to say we all mean the same thing, so lets accept these as synonyms where after we write three entries in our glossary

username, …”,

handle, see username, and

alias, see username.

So, is that so bad? I guess we can handle three words meaning the same thing. Well, the problem is not the single words, it is the language and in the combinatorial explosion.

We also need to define what we call when we control that a username fulfils the formatting rules. It turns out that we preferred validate, however those that were on the Gazunga project (disregarding department) seem to prefer checkup. Let us accept these as well and add to our glossary

validate …”, and

checkup, see validate.

Even if each word have a very limited number of synonyms, we now have lots of synonyms for phrases (remember that when a glossary is about words, the ubiquitous language is about phrases). There are six synonym prhases for validate username.

  1. validate username
  2. validate alias
  3. checkup handle
  4. validate handle
  5. checkup alias
  6. checkup username

Further on, the user account are by some referred to as pref-set (preference setting space), and by some as area (as in private work area). So, now there are no less that eighteen ways to phrase validate the username of the account.

Et cetera, you get it.

So, letting synonyms into the ubiquitous language quickly leads to having not one language, but a lot of dialects that quickly drift apart to form separate languages and the ubiquity is gone.

Of course there are occations when you have to accept synonyms. If there is an established terminology outside the project there is a point in adhereing to it but what to do if there are several competing standards. For example, the finance department might want to use the accepted term imbursement, but the marketers insist that the established term in our customer base is money-forward both fully acceptable external bodies. In these rare cases we have to accept synonyms, but I still advice to denote either of them as the primary term, and only use the other when necessary. In the glossary it might say money-forward (aka imbursement), a payment made by in exchange for (imbursement preferred by finance department).

By all means, do use synonyms when they are absolutely necessary, but be very restricted. I promise, if you allow synonyms at an early stage, then confusion will arise somewhere down the road.

Yours

Dan

ps When trying to avoid synonyms you in many ways have the same mind-set as when establishing a canonical data format in the model - but you work in slightly different areas.

Thursday, 8 October 2009

Validating away SQL Injection

Dear Junior

So, if we want to ban OR 1=1 from being used as a username, we have to put that restriction into the model.

We have noticed that viewing the username as just a string does not help us much.

Integer authenticate(String username, String passwordMD5)

So, to be able to talk about usernames in a meaningful way , we introduced the class Username. This actually makes a huge difference as the authentication method is now explicitly distinguishing the username from other kind of indata.

Integer authenticate(Username username, String passwordMD5)

With username as an explicit part of the model, we now can say something like “‘ OR 1=1 is not a valid username”. The beautiful part is that this sentence both speaks immediately about a property in the conceptual mode, and about a property of the code.

The sentence is even well-formed to the degree that it can be turned into a unit test case.

@Test

public void shouldNotRegardInjectionAttackStringAsValid() {

assertFalse(new Username("' OR 1=1 --").isValid());

}

Not bad, if you can write a requirement as a unit test, then you are pretty well off. And, if the test is failing: the better, we have a licence to code, and we know when we are done. Not bad at all.

Implementing, adding a few related tests and going back to green will probably take us to a Username class looking roughly like this.

public class Username {

// final making it immutable

public final String username;

public Username(String username) {

this.username = username;

}

public boolean isValid() {

return username.matches("[a-z]+");

}

}

Hold the horses! We have now modified a class that encodes a central part of our domain at hand. So, we have actually changed the model. That is nothing to take lightly. Now we have to walk the round checking with the other stakeholders in the domain that this change is something we all can agree upon. Is it really ok that usernames should be only lowercase letters and that there must be at least one? In other words, should they look like “danbj”, “danberghjohnsson” and “ilovemylittlepony”?

In the case of usernames people are used to strange formats and restrictions, so my guess is that no one will object, but we have to check – otherwise the model turns from being a shared and agreed model to being “the developers stuff”.

Upholding the model as the ubiquitous language for talking about the system is essential for the quality of the system – it was that language that enabled us to turn a one-phrase requirement and turn it immediately into code. Further on, the quality of the system is essential for its security.

So – for the sake of security, if nothing else, work hard with that model.

Going back to the API we are building, it turns out pretty well. We have achieved that it is easy to use correctly i e, when doing the right thing there is no much doubt. At the moment, you have to create a Username object (constructor), check that it is valid (isValid method) and then pass it to the authentication.

However, we have not yet achieved that the API is hard to use incorrectly, it is not stable to erroneous use. One really weak spot is that it is easy to miss calling the isValid method, and there is nothing in the rest of the API that catches that mistake.

So, we are not finished yet.

Yours

Dan

Thursday, 1 October 2009

Domain Driven Security and Making Stuff Explicit in the Model

Dear Junior

”But ’ OR 1=1 -- is not a valid username! That is just bad indata validation!”. Well, ‘ OR 1=1 -- might not look like the kind of username we had in mind, but invalid? Says who?

If we have a look at the code, the signature of the authentication method says:

Integer authenticate(String username, String passwordMD5)

Basically, in the code there is nothing saying that username is any special kind of data – it is just a string. And, as such, it can be any string – including ’ OR 1=1 --.

There might be conventions, even documented such, that a username should have certain structure – but the model represented in the code consider any string to valid to send into the method.

The Domain Driven Design take on this is that if you have more restriction in your intended model, then you should better put those restrictions in the code – explicitly.

So, let us take a small step in that direction – let us make Username an explicit part of the model. Later on we can elaborate that part of the model by making restrictions on usernames explicit, and even enforcing them. But let us not take too big a bite – for now we settle for shaping up the model.

If we think about it we can surly agree that username is a special kind of data, separate from amounts, order numbers, or phone numbers. It would simply not make sense to have a phone number “+4615210000” used as a username. This is analogous to the distinction between int and boolean. Under the hood they are both “just bits and bytes”, but we want to keep them distinct in our language so that we do not accidentally use an int as the condition in an if-statement, for example. In C many hard-to-find bugs have been caused by that specific mistake.

In static typed programming languages like Java, C# or ML, we use the type system with interface and classes to separate different kinds of data. However, if we audit the authentication code we will see that there is no representation of username on that level. The only place “username” show up is as the name of a String-typed variables and parameters. The knowledge “username is a specific kind of data with its own rules and restrictions” is not explicit in the code.

Enter class Username, which at this stage might be the simplest kind of value object.

public class Username {

public final String username; // final making it immutable

public Username(String username) { this.username = username; }

}

The important part here is of course that we now have a new type, which can be used by variables, fields, parameters, and returns to make the code explicitly talk about usernames.

The authentication method will change somewhat.

/** Authenticates a user with a given password.

* @param username

* @param passwordMD5 hash of password

* @return user id, or null if no matching account

*/

Integer authenticate(Username username, String passwordMD5)

throws SQLException {

Connection con = accountDs.getConnection();

Statement stmt = con.createStatement();

String sqlSelect = "SELECT uid FROM Accounts";

String usernameMatch = "username = '" + username.username + "'";

String passwdHashMatch = "passwdHash = '" + passwordMD5 + "'";

String sql = sqlSelect +

" WHERE " + usernameMatch +

" AND " + passwdHashMatch;

ResultSet rs = stmt.executeQuery(sql);

Integer result;

if(rs.next()) { // found account with matching password

result = rs.getInt("uid");

} else { // no matching account

result = null;

}

return result;

}

So, whoever wants to call the authentication method with a username, must first create a Username object via the constructor.

public class LoginAction {

void doit() throws SQLException {

Username username = new Username(form.username);

String passwordMD5 = form.password;

accountService.authenticate(username, passwordMD5);

}

}

Now the concept of username is explicit throughout the code, and actually talks the same language as the people working with it. In effect, we have made username a part of the ubiquitous language talking about the system.

Note that we are still not yet protected from bad usernames, that will be a later step - but at least we talk about usernames, not strings.

The distinction between username strings and usernames is subtle. This distinction might seem small, but I think it is essential – as the language form how we think. The moment the programmer start expressing herself in domain terms (creating a Username object), chances are higher that she will also question the indata parameter: Is this string really a username? Where did it come from? Has it been properly checked? No guarantee, but chances are higher.

We still have some way to cover before we have an API that is both easy to use correctly, and hard to use incorrectly – but at least we have taken a step in that direction. We still lack the constraints on usernames, and there is no enforcement at all.

However, if we can guide the programmers into thinking about the model a la DDD, and thus decrease the risk of severe application security flaws, then we have at least done something useful.

And it is usefulness that is the ambition of Domain Driven Security.

Yours

Dan

PS My colleague John Wilander just published a nice example on how they did with Swedish "person number" (roughly social security number) [Swedish]