Dear Junior - Letters to a Junior Programmer: October 2009

Tuesday, 27 October 2009

Release Planning Spreadsheet

Dear Junior

From time to time I have had to start setting up a backlog, start tracking velocity, and create a release plan in parallel. This situation is a little bit of a nuisance as I prefer the "scientific approach" of letting data speak. In this case that means observing the velocity of the team for a few sprints, have a look at the backlog, and then base the release plan on that data. Inspect and adapt. Inspect velocity, inspect backlog estimates, adapt release plan.

The problem is that this is a little bit hard to do when you are asked to present a release plan before you have no, or just a few, observed velocities to deduce from. Strangely enough executives seem to be accustomed to project managers with precognitive skills, and I unfortunately lack those. So, those before me have set the expectations and "not enough data yet" seem to be considered somewhat a "weak excuse". I do not blame the executives, I blame the project managers that have set "speculative guesstimates" as the standard - easy to make, hard to fulfil.

Luckily it is often possible to stall the release-plan for some time, which gives some lee-room to gather a few velocity data, and get the backlog in decent shape.

When you have at least eight sprint velocity observations, Mike Cohn has a neat short-hand trick for releases at least four sprints into the future; I have mentioned this trick in an earlier letter. However, when having less data, I have used the analysis of the mentioned method to squeeze the maximum information out of just a few sprints of observations.

The problem is that you want to give an interval you feel confident with (e g 95% confidence) and it takes some time to gather enough data for creating a decent interval.

In short:

velocity from one sprint is useless, it gives you an idea about average, but nothing on the variation
velocity from two sprints starts giving you a decent average, but with only two data-points it is hard to judge variation and giving an interval
velocity from three sprints starts giving you a fair idea on variation, and you can give an interval

Of course, the calculations can be done through standard statistical computations. However, doing that number exercise is a little bit unnerving when you have lots of other things to think of.

Therefore I have put together a handy cheat-sheet, which I by the way did share with a few people who attended my session on Why Release Planning Works at the just-passed Scrum Gathering in München (Munich).

In short, you fill in how many sprints until release, and then observed velocities.

After each observation, the sheet tells you the interval of 95% confidence of how much you will cover in the remaining sprints, and how much the total sum of work will accumulate to.

It also plots out a nice graph if you care to share it with the stakeholders.

A word of advice and warning when presenting the release plan to stakeholders: I prefer to talk about what stories seem to be in the release (before the 95% interval), which are in doubt (those in the interval), and which will not make it (those beyond the interval). If you present the numbers as such (or this graph), they might start taking those numbers a little bit too serious. You definitely stand the risk that upper management might start viewing it as a productivity measure, which will destroy its usability within a few sprints.

Composing a good release is also an art in itself, and much more that just picking "most valuable". A good release should contain what Kano-analysis describes as the mandatory features, and a few linear - but also at least one "exciter". If not, people might be well satisfied with it, but not raving about it. So, the spreadsheet gives a hint of what "budget" you have for the release.

Anyway: hope you will find the spreadsheet interesting, and at some point useful.

Yours

Dan

Saturday, 24 October 2009

Indata Validation is Not Enough for SQL Injection

Dear Junior

When we do indata validation through value objects we get an application tier that is water-proof. The model describes exactly those data that we think are meaningful and can handle (“username, identifier with which the user present herself to the system; regexp [a-z]+“. Each piece of indata is validated as part of the value object constructor (public Username(String s)). For indata to pass through to application, it cannot avoid validation, as the application methods require the value object type (void authenticate(Username uname…)). What more can you ask for?

- Well, you see, we do not want usernames like “danbj”, we would prefer our real names, like “Dan Bergh Johnnsson”.

No problem, we expand the regexp with uppercase letters and space, getting a regexp like [a-zA-Z\ ]+.

- Nice, but our respected colleague Fredrik Jägare-Lilja needs a username as well.

Fair enough – we stuff Scandinavian letters and hyphen into the regexp as well, giving [a-zåäöA-ZÅÄÖ\ \-]+.

- Now there is only one person left: our highly respected Irish colleague Oliver O’Hehir.

Well, well, we are almost finished then, we only need to put the apostrophe into the regexp ending up with something along [a-zåäöA-ZÅÄÖ\ \-\’]+.

Wait, wait, wait!!! Who the h*** just logged in with username “’ OR ‘a’ is not null --“?

Sure, we might have tightened up the regexp to block out that specific attack string and any other malicious use of the format we can think of. But, it is always those we did not think of that causes the trouble.

Well, as always we can think that “SQL Injection is solved by prepared statements”, but remember that Injection Flaw is much larger than SQL Injection. The same vulnerabilities might be there when doing LDAP access, using parameters to construct file names (e g Directory Traversal), or if you have some Domain Specific Language (DSL) which you interpret. In any of these cases there might be a string that might well be fully legally formatted, but attacks the structure of how the underlying resources are used.

Over to a completely different domain: FM broadcast and music radio. In the FM radio broadcast system you must be able to shut down the transmitters from a remote site. Unfortunately, there is only one way to communicate with the transmitters – via radio. The problem was solved by defining a specific sequence of audio blips (very precise on frequencies, duration, and interval) and denoting that sequence the meaning “shut down the transmitter”.

The pioneering Swedish rap group JustD put that exact sequence as the final beat on one of their songs, without telling anyone. They must have laughed all the way home from the studio. That song has been played on Swedish radio exactly once.

The JustD track hack is a wonderful example of exploiting an Injection Flaw, there is no way to escape it “in band”. I have the same gut feeling about indata validation and SQL Injection.

No matter how we structure the indata model, there might always be some data that actually is valid indata, but causes the system to crash.

So, indata modelling and validation in all its glory: However necessary it is for upholding security, it is not sufficient.

Yours

Dan

Thursday, 22 October 2009

Util Methods does not Work

Dear Junior

When writing the validation logic for the username

public boolean isValid() {

return username.matches("[a-z]+");

}

I suddenly heard a distant screaming: “Why did you not use the util method for validation?” Ehhrr … sorry … which method? Ohhh, over there … in the se.xyz.services.util.stringutils package there is a util class StringValidationUtil with a validation method.

public class StringValidationUtil {

static public boolean logincheck(String username) {

if (!(username.length() > 0)) return false;

for(int i=0; i<username.length();i++)

if(!Character.isLowerCase(username.charAt(i)))

return false;

return true;

}

I am sorry, I guess I just didn’t find it.

It is strange that I did not find it, because it is actually called as part of account creation when registering a new user. Did I not look for it properly?

Well, it is also strange that I did not find the method in se.xyz.utils.security.AccountTransformUtils, because there you can find

static public boolean okNewUsername(String username) {

boolean result = true;

if (username.length() == 0) result = false;

for(int i=0; i<username.length();i++)

result = result && Character.isLowerCase(username.charAt(i));

return result;

}

That method is by the way also called, as part of the check when someone wants to change username. Did I not look properly for that either?

And of course there are some more methods in se.xyz.accmgm.AccountUtil and in the ever-present se.xyz.util.Util that all basically do the same thing - check that an account name has the proper form.

My real-life record was a util class that contained five different implementations of checking that a string was a date on the format “YYYYmmDD” – and between those implementations, there where subtle differences when handling some strange cases. By the way, there where also three more different implementations in another slightly differently named util class as well.

So, how come this multitude of util methods? They are simply not found! And the programmer in need for validating that there are only lowercase letters in the string at hand will probably look for the needed method for ten to thirty seconds, where after she will implement it herself – after all it is not that difficult. Then, to make “my nice method helpful for everybody else” it is moved to some util class.

As a side-note, you can note that most util methods are ‘static’. To me ‘static’ in an oo-program means “homeless”. Those methods could reside equally well in any other class. And residing in some obscure hide-away package does not make them easy to find.

For a method to be used, it must be in the middle of the road where the programmer is going. That is what object-orientation is good at, the methods are hung up on the data you have in your hands, so the methods are easy to find.

But unfortunately, static util methods do not work that way. They simply do not work.

Yours

Dan

ps A better place to put validation is inside the corresponding value object

Wednesday, 14 October 2009

Ensuring Indata Validation

Dear Junior

Creating a username class and a validation method has taken us a fair amount towards solving SQL Injection by focusing on a domain model API that is both easy to use correctly and hard to use incorrectly. I would say that we have this far achieved to make the API easy to use.


  Integer authenticate(Username username, String passwordMD5)  
 public class Username {
  // final making it immutable
  public final String username; 
  public Username(String username) { this.username = username; }
  public boolean isValid() { return username.matches("[a-z]+"); }
}
What remains is to ensure that indata validation actually is done. I can see two choices: either putting validation inside the authentication service, or to enforce validation before the call to the authentication service.

Let us first look at putting validation inside authentication.


  /** Authenticates a user with a given password.
   * @throws IllegalArgumentException if username invalid
   */
  Integer authenticateWithUsernameValidation(Username username, String passwordMD5)
          throws IllegalArgumentException, SQLException {
    if(!username.isValid())
      throw new IllegalArgumentException(
              "Cannot authenticate with invalid username: " + username);
    ...
  }

This definitely hardens the interface – now there is no possibility to not validate upon authentication. However, the same trick has to be used in every service method around, including the “create new account”, the “change account username” and all those that are to come in the future. Risks are high that the small isValid-call will be missed somewhere – and one hole is all an attacker needs.

Another drawback is the rather awkward “throws IllegalArgumentException” which feels like a very late validation – should not such validation be made much earlier, preferably up in the presentation and client tiers?

An alternative is to not allow invalid usernames to be constructed at all:


  @Test(expected = IllegalArgumentException.class)
  public void shouldNotCreateUsernameFromInjectionAttackString() {
    new Username("' OR 1=1 --");
  }

This request the constructor to do the validation on the inside, responding with an exception if given an invalid username candidate.


  public Username(String username)
    throws IllegalArgumentException {
    this.username = username;
    if(!isValid())
      throw new IllegalArgumentException();
  }

Now we also need some way to validate from the outside without taking the pain of provoking and handling an exception, so finally there will be a static method after all:


  public static boolean isValid(String username) {
    return username.matches("[a-z]+");
  }

Of course the old methods and constructor should be refactored to uphold the don’t-repeat-yourself (DRY) principle. Interesting enough this will lead the isValid() method to consistently return true – so I guess we can delete it from the class and inline it wherever it was used. That is, unless we for some bizarre reason want to have a method that explicitly tells the rest of the world that “this object is always valid”.

I definitely prefer this latter “strictly-validated-value-object” style before the "validating-service-methods" style. It creates an API that besides being easy to use correctly, also is hard to use incorrectly. It “guides” the client side programmer without being intrusive or obstructive about it.

In some sense, it "enforce" a behaviour upon the client side programmer. However, that does not trouble me. If someone just is nice to me, and don’t cause me trouble, I see no obstacle in letting her have her way.

Yours

Dan

Monday, 12 October 2009

Avoid Synonyms in the Ubiquoutous Language

Dear Junior

When walking the round trying to establish a new term in the ubiquitous language of a system, it is very tempting to start accepting synonyms.

Perhaps we tried to make 'username' explicit in the model. We then need to settle what a username is, and how it is checked against its validation rules. Among the programmers, we have used “username”. When we get over to the GUI designers they tend to talk about it as the “handle”. Later on we find out that the tech writers dig the term “alias”, and that is what they have used in the manual. It might seem tempting to say “we all mean the same thing, so lets accept these as synonyms” where after we write three entries in our glossary

“username, …”,

”handle, see username”, and

alias, see username”.

So, is that so bad? I guess we can handle three words meaning the same thing. Well, the problem is not the single words, it is the language and in the combinatorial explosion.

We also need to define what we call when we control that a username fulfils the formatting rules. It turns out that we preferred “validate”, however those that were on the Gazunga project (disregarding department) seem to prefer “checkup”. Let us accept these as well and add to our glossary

“validate …”, and

“checkup, see validate”.

Even if each word have a very limited number of synonyms, we now have lots of synonyms for phrases (remember that when a glossary is about words, the ubiquitous language is about phrases). There are six synonym prhases for “validate username”.

“validate username”
“validate alias”
“checkup handle”
“validate handle”
“checkup alias”
“checkup username”

Further on, the user account are by some referred to as “pref-set” (preference setting space), and by some as “area” (as in private work area). So, now there are no less that eighteen ways to phrase “validate the username of the account”.

Et cetera, you get it.

So, letting synonyms into the ubiquitous language quickly leads to having not one language, but a lot of dialects that quickly drift apart to form separate languages – and the ubiquity is gone.

Of course there are occations when you have to accept synonyms. If there is an established terminology outside the project there is a point in adhereing to it – but what to do if there are several competing standards. For example, the finance department might want to use the accepted term “imbursement”, but the marketers insist that the established term in our customer base is “money-forward” – both fully acceptable external bodies. In these rare cases we have to accept synonyms, but I still advice to denote either of them as the primary term, and only use the other when necessary. In the glossary it might say “money-forward (aka imbursement), a payment made by … in exchange for… (imbursement preferred by finance department)”.

By all means, do use synonyms when they are absolutely necessary, but be very restricted. I promise, if you allow synonyms at an early stage, then confusion will arise somewhere down the road.

Yours

Dan

ps When trying to avoid synonyms you in many ways have the same mind-set as when establishing a canonical data format in the model - but you work in slightly different areas.

Dear Junior - Letters to a Junior Programmer

Tuesday, 27 October 2009

Release Planning Spreadsheet

Saturday, 24 October 2009

Indata Validation is Not Enough for SQL Injection

Thursday, 22 October 2009

Util Methods does not Work

Wednesday, 14 October 2009

Ensuring Indata Validation

Monday, 12 October 2009

Avoid Synonyms in the Ubiquoutous Language

Others recently read

Blogroll

Blog Archive