Monday 4 August 2008

Javadoc as Self Defence

Dear Junior

When working with a code base I often get into the situation where I find a piece of code that I do not understand the purpose of. I might see that it actually do something because it is called from other parts of the system, but I cannot see why, what purpose it does fill. Such a piece of code runs a lot of risk to fall under the axe, be thrown away, and replaced.

Is that not unfair? How do I dare just to discard someone else piece of work and replace it with my own? How would I react if someone else did that to my carefully crafted solutions?

Every time a programmer is about to modify or extend a piece of code, then that code stand in front of a higher court and have to justify its existence. If the code cannot convincingly plead its case, it will probably go into the land of beyond, i e into the version control attic. And in that trial, being some long windily method with non-descriptive name and having neither javadoc, nor unit tests will be equal to saying nothing to your defence.

But still, is that not unfair? Well, that might seems so, but obviously the original programmer has not thought it worthwhile to spend one minute to motivate its design. And in that case, what motivates that I spend ten minutes on trying to reconstruct the motivations? Is my time worth just a tenth of the other programmer's? To me this is much about courtesy from one programmer to another - paying respect to each others time.

To return to the code on trial; how do I try to save my code in future trials, when I might not be present in person? I try to equip them with a possibility to plead their case for themselves. I give each public method at least one single sentence of javadoc that in plain language explains, or at least indicates, the purpose it fulfil. And by just that I drastically increase the chances that the method will be understood and left to live.

Whenever possible, and that is not always the case working deep inside already existing code, I try to strengthen the case by adding objective proofs of the code's fitness, in the form of executable tests. Here the purpose of each test is to prove some aspect of the code, e g that in a set the removal of an element is independent of how many times it is added. Here again, the names of the test methods should indicate the purpose of the test, e g testShouldDeleteElementFromSetOnRemoveEvenAfterMultipleAdds. Please check out some of Dan North's writing on behaviour driven development to get some more ideas. But again, the test will help to plead the code's case and make it stay alive for yet some time.

So to radically increase your code's probability of survival, ensure on your next commit that every public class, method, or field is equipped with a proper plead of self defence in the form of at least one single sentence that explains its purpose. It is the least you can do for your creation.

Yours

Dan

Thursday 5 June 2008

Checked Exceptions as Disclaimers

Dear Junior

Almost all codebases I have seen have had issues with exceptions. Considering that they are kind of mandatory to cover in even the first introductory Java course, it is surprising how poorly understood they are.

My favourite way of thinking about exceptions is to see them as disclaimers to a contract.

For example, if I employ a plumber to do some work with the plumbing in the kitchen, we will agree on some kind of contract. The contract does not actually do the work, but it specifies the conditions when the work is performed. So, to start with we will agree what the work will result in, in this case fixing the bathroom floor sink.

But then the plumber mentions that in houses build around the time my house was built there sometimes are very strange plumbing. Instead of the pipes being made out of metal and with clockwise fittings, these have ceramic pipes and anti-clockwise fittings. There is really nothing wrong with the ceramic, anti-clockwise plumbing - it does it job and is perfectly normal, just very unusual.

The plumber points out that she has neither the competence, nor the tools to fix that kind of plumbing. Therefore she asks me if I can ensure that it is the standard type of plumbing. Frankly, I have no idea. Negotiation goes into stale mode.

Now I have a choice. First choice is to employ the plumber to come and investigate the type of plumbing, and thereafter be sure that I can employ her to actually fix the plumbing. Second choice is to take the risk, employing her and saying "if it is the strange type, I understand if you can't do the job", and then have a plan to handle the situation. This is fine with the plumber, but she insists on writing this into the contract as a disclaimer: "the employer is not obligated to investigate the type of pipes in advance; however, if they are of non-standard type, then the plumber is not obliged to do the job".

After judging what would suit me best, I settle for the second option: a contract where I do not need to check in advance, where there is a disclaimer, and where I have to handle the unusual situation if it occurs.

Let us now take this story and put it in the context of designing classes, distributing responsibilities and defining separation of concerns in an object oriented system. We then end up with the contract metaphor described by e g Bertrand Meyer in "Object Oriented Software Construction", often referred to as "Design by Contract".

The first contract just describing the job would probably end up as an interface defining the method which the client can call.

interface Plumber {
void fixFloorSink(Bathroom);
}

The result of calling the method is a state-change in the bathroom. In Meyers terminology, this is a post-condition and we should document it in the javadoc. Even better, we should write a unit test, but that is kind of beside the point here.

/**
* Fixes the floor sink.
* postcondition: bathroom.drainsWaterOnFloor() == true
*/
void fixFloorSink(Bathroom);

Let us add the disclaimer. Obviously, this will show up as an exception. In the spirit of Domain Driven Design, we try to name the exception in a way that makes sense in the domain.

void fixFloorSink(Bathroom) throws NonStandardPipeException;

What kind of exception should NonStandardPipeException be? Well, according to the contract, the caller/client/house-owner/I does not need to check the condition in advance, but have to handle the situation. This maps neatly onto try/catch, so we make NonStandardPipeException
into an ordinary, checked exception.

class NonStandardPipeException extends Exception {}

A small note here is that this situation is not what Meyer calls a pre-condition. A pre-condition is something that has to be true before the method is called, and which the caller is obliged to check. But that will have to be the subject for some other letter.

Now for the client side, the house-owner I, will be forced by the compiler to take responsibility for its side of the contract, handling the situation if the disclaimer is raised.

void takeCareOfNonDrainingBathroom() {
try{
findPlumber().fixFloorSink(house.upstairsBathroom);
} catch (NonStandardPipeException nspe) {
fixPipesYourself();
}
}

Note the ambitious (or desperate) attempt to handle the exceptional situation. The important thing is when the try-catch block has finished; the resulted situations are equivalent, regardless what path the execution took - the contract of the method 'takeCareOfNonDrainingBathroom' has been fulfilled. If we cannot do that we in our turn should consider revising our contract, adding a disclaimer and throwing an exception.

So, an exception is our way to realise the concept of a disclaimer in a contract.

Yours

Dan

  • Bertrand Meyer, Object-Oriented Software Construction


Thursday 8 May 2008

Version of Data in Database

Dear Junior
I recently had a database versioning going into production. And by database versioning I do not mean that we know what patch of level the database management system (in this case IBM DB2), but the business data itself.
If you never or seldom change the database definition you can easily keep track of changes. However, with development agile style we constantly change our system either by adding functionality or by doing refactoring of the structure, and then we need to change the database as well.
Sooner or later, we run into the situation where we want to test some functionality, get a backup from production, and then wonder: "ok, so what structure changes do I need to do to get this usable?" It might be easy to see what tables or columns are missing and should be added. However, we then rely on a fresh nights sleep and good morning coffee, and still we might miss more subtle changes. And, trust me, this gets even worse if it is not a production copy, but some test data database that have been lying around and where disparate groups have performed random experimental changes "just to see what it would look like" - changes that later may have made it to production, or not. In that case you should expect turmoil of inconsistencies.
The best way I have found to get out of this dilemma is to assign each version of the database structure a number. Of course you need some place to start, and at this last occasion we selected what was in production on 2008-01-01 and announced that as structure zero. The idea is that every change of the database must declare what version it is a change to. So, instead of saying "add column FOO to table BAR", you must say "if the database is in version 32 you can get it to version 33 by adding column FOO to table BAR". So, this would be the SQL-code for "patch 33".
The information on database structure and the patch-scripts are of course in our subversion in a directory of its own: "database" and with subdirectories "baseline", "patch001", "patch002" etc.
The baseline in our case is nothing more than the DDL for recreating the database structure as it looked in production in 2008-01-01, plus scripts for inserting "base data" that has to be there and is not managed by Canatala (which is the name of the application).
In this case the application connected a bank to several insurance providers; the base data it needed was information about the insurance companies. (In fact, I have substituted the application and the domain for non-disclosure reasons, but the problem is the same.) Anyway, this data was not managed by the application, but imported as a batch on first deploy, and then enhanced by inserts at later deployments.
To get really down to earth the directory contains twelve files.
  • baseline-sql01-create-structure-ddl.sql
  • baseline-sql02-insert-basedata-insurance-providers.sql
  • baseline-sql03-insert-basedata-xyz.sql // some other base data
  • baseline-sql04-insert-basedata-uvw.sql // yet more base data
  • baseline-sql05-sp-PROVIDERSYNC.sql // a stored procedure
  • baseline-sql06-sp-XYZ.sql // another stored procedur
  • ...
  • baseline-sql12-sp-UVW.sql // last stored procedure
Note that the files are named in a way that enables a script to automatically apply them in the right order using the filename up to the second '-'. At the same time the file names describe to a human eye what the SQL is about, without disturbing the script. We were kind of proud over that balance.
The "patch" directories contain all the SQL-files needed to take a database from one schema version to the next. As an example, we shortly thereafter came across a new concept "insurance delegate", which we also needed to store in the database - thus a new table INSURANCE_DELEGATES.
This table also needed to be filled with some data to match the existing insurance providers. All this became patch 002, so the "patch002" directory contains the files needed to take a version 001 database and upgrade it to version 002.
  • patch002-sql01-create-provider-delegate-table.sql // creating new
    table
  • patch002-sql02-insert-provider-delegates.sql // filling the new table
  • ... // another script excluded, you will soon see what it is about
I think you got the grip by now. If you need to upgrade a database, just run all the scripts in all the patch-directories of a "higher" version then what you have. If you need to create a database from scratch, use the baseline and then run all the patch scripts.
So, but how do you know which patch-scripts to run, i e how do you know what patch-level you are at?
The really good part is that this version information should be *part of the data*. In this last case we added a new table defined by:
--- Table on database version (current and history), when patch was
--- deployed, and what it consisted of
CREATE TABLE CANATALA_DB_VERSION(
CANATALA_DB_VERSION BIGINT,
PATCHDEPLOYTIME TIMESTAMP,
DESCRIPTION VARCHAR(256)
);
And then, whenever you apply a patch, you change the data in this table to reflect the change of patch level.
Actually, there are two ways of using a table like this. Either you just save the information of the current version, or you save the entire history. I have done both, and settled for saving the history because it really helps when you "find" some database lying around and wonder about its life hitherto. Next time I will also probably extend it with some kind of environment information, e g an extra column storing the IP-address of where it was running when it was patched to distinguish if it was patched in production, or upgraded after being copied from production.
As you might have guessed, this table was not in the database when we started, so the CREATE is not part of the baseline-sql01-create script. We had to introduce it.
Actually, introducing the version table was our first patch, patch000. So directory patch000 contained two files, one introducing the structure, and the other inserting the data that "this database is now of patch level 0".
  • patch000-sql00-create-version-table.sql
  • patch000-sql01-insert-dbversioninfo.sql
First file contains simply the CREATE TABLE I mentioned earlier. The second file contains the INSERT needed to insert "version 0" into the table:
INSERT INTO CANATALA_DB_VERSION
(CANATALA_DB_VERSION,PATCHDEPLOYTIME,DESCRIPTION)
VALUES
(0,CURRENT TIMESTAMP,'Starting versioning of database');
I guess this makes it pretty obvious what is in the file I left out in patch002 - it is the db-version insert for that patch.
> patch002-sql03-insert-dbversioninfo.sql
INSERT INTO CANATALA_DB_VERSION
(CANATALA_DB_VERSION,PATCHDEPLOYTIME,DESCRIPTION)
VALUES
(2,CURRENT TIMESTAMP,'Adding support for insurance delegates');
If we follow this scheme, we can take a look in any database and see its patch-history and decide what patches we need to bring it to a usable patch-level.
Of course, this model is not perfect - there are several ways to improve it. For a start, there is no guarantee that each patch actually will contain a dbversion insert. One way to fix that could be to have the patches be applied by some kind of special tool. That tool could check that the patch directory actually contains such a file and otherwise refuse to apply the patch. Another drawback is that it is possible to apply a patch to a database of the wrong level, e g applying patch 34 to a database of version 31. You might even accidentally apply the same patch twice. This is also something a tool could check.
Even if there are plenty of ways to improve this model it is a really good way to start.
But the best part has not even started. Once you have got versioning of the database under control, you sudden have got the possibility to start working with the database in a new way - such as doing refactorings of the database structure. And we can get agile in the database tier as
well.
Once you have gotten your toes wet, there are lots of places to dig deeper. I would like to point out two. K Scott Allen has written five blog posts on database versioning that are really well worth reading [see Jeff Atwood's blog at http://www.codinghorror.com/blog/archives/001050.html]. However, thereafter I really recommend "Refactoring Databases" by Scott Ambler and Pramod Sadalag where they really show the power of the approach.
Yours
Dan
ps A great thing with having a versioned database is that you can start refactoring it to mitigate the mapping between the domain model and database design.

Saturday 5 April 2008

Keeping Backlog in Shape Using a Backlog Shortlist

Dear Junior

Most products and projects tend to sooner or later get a really long list of things to be done, the Product Backlog to use the Scrum terminology. Keeping the backlog in shape can be a really tedious job. This is certainly true for all places that I have been working on. We need something more agile.

The backlog can take many shapes - excel spreadsheets, Jira items, wiki pages or all of the above - please try to collect them in one place. In an ideal world all items on the list should be well-written, well-understood and reliably estimated (preferably using "points", "trees" or some other unit not based on physical time).

We cannot really stop the list from growing. The reason for the list growing long is simple, it is easier to generate ideas than to realise them. Either you inherit a log backlog or you will soon have one. So, we cannot avoid if from growing, and we cannot spend all our time waking through that list polishing each item. Doing so would be both frustrating and a waste of time.

However, there are some points why estimating stories is important. And these are the once that we should focus on, so we can get the most benefits without paying the full price. I see three main good effects from estimation.

The most obvious benefit is that when the product owner selects what should be developed next, she needs to know the cost associated with each item - to be able to get the most bang (business value) for the buck (development effort). However, estimation also forces the developers to understand what is requested - which is helpful so they do not have to spend time within the sprint to find out (of course, they will need to refine their understanding and iron out wrinkles together with the business people, but at least they get the scope right). But thirdly, and not much talked about: estimation also drives a discussion on architecture and design, if allowed. To estimate you must have a rough idea about how to design it, and if peoples' perception of that differs a lot they will give very different estimates. This is an excellent way to find out areas where the team needs some time to synch their ideas. Unfortunately, it happens that project leads do not realise the necessity of this.

To get the benefits of estimation without paying the price of maintaining 1000 stories (change requests, bugs and enhancement ideas), I use what I call a "Backlog Shortlist". The Backlog Shortlist is simply the upper-most (i e highest prioritized) part of the product backlog. These are the items that the product owner will keep in crisp condition and ensure that they are properly ordered according to importance. E g, when working with the BDD story format I usually insist that these should have at least two example scenarios, which also gives a good idea about how the story should be demoed. Outside the shortlist I am far more tolerant against sliminess and other non-crispiness.

At each planning occasion I want to walk through all the items on the shortlist, ensure that everybody understands them and re-estimate them. This re-estimation further improves the benefits of estimation I mentioned: product owner's awareness of development cost, scope understanding, and synchronized ideas about design within the team.

Firstly, the cost estimates benefit a lot from re-estimation. From time to time the team realises that a story has become much easier because of job already done and the estimates might drop significant. For example I have experienced that during one sprint we implemented a story that made us create some objects. At the next planning we realised that another story now became almost trivial when the object structure where in place, and the estimate dropped. I do not remember the precise numbers, but it was something like dropping from thirteen to two.

Secondly, repeating the estimation a few times when the story is just a hot candidate, but not yet to be developed, gives the product owner repeated opportunities to get feedback about the cost, and perhaps modify the story by simplifying it or breaking it into parts. It also keeps the developers up to date about what is hot, so that the selection for the next sprint does not come as a surprise. The already recognise the topic and are pretty familiar with it.

Thirdly, as long as the team has a common understanding of the architecture (should we use a domain modelled oo-structure, or dao us directly to the database) the re-estimation is fast. However, if the team have very different views on how a feature should be implemented, it is better to resolve those issues before it is selected for a sprint.

To get these benefits I try to size the shortlist to be double the size of what we can do in a sprint. In that way, each item will be re-discussed and re-estimated twice before being selected for development. At the same time, the time to do the re-estimation does not get too long and burdening.

In short: using a backlog shortlist is just a way to avoid premature exactness, or the other way around: to enforce exactness when it becomes productive but not before.

So, next time you try to get a backlog in shape: focus the effort on where it gives the best result - on the shortlist.

Yours

Dan

Tuesday 19 February 2008

Psychology of Tests, Testing and TDD

Dear Junior

There have been lots of talk about the benefits of test driven development, but there is one element that I think is interesting that I not have seen a reference to - the psychological process.

Pretend that we dive into the head of a programmer (e g me) and see what happens during testing. Let us firest have a look at "classical" post-fact tests and then at the TDD way of writing the test before the code.

If I write some functional code and consider myself finished I will without doubt feel immensely proud of myself - in other words my ego is on a high note. Then I sit down to test it, and let's see what can happen: either I find bugs and flaws, or I don't. If I do not find any bugs, I do not get any extra credits in my own books - I have been exactly as good as I thought I was (but I might feel that I have wasted some time). If I do find bugs I get a slap in the face - I have not been as good as I though I was (and have committed hubris, which is a severe sin on the list of things considered as sins in Sweden). So, even if testing will improve the general level of the result, the activity of testing (per se) will never give an upside, can at best have a zero outcome, and at worse a big minus. Depressing.

So, how do I treat activities that give me no upside, but possible downsides? Naturally, I tend to avoid them, and over time minimize them.

So, I tend to shy away from testing and writing tests; even though I am firmly convinced of their necessity. Depressing.

Driving the development through tests breaks that negative feedback loop. If I write the test before the code, naturally it will fail: I have no other expectation, and the red bar is not a failure. Later on, I write some code, which make the test go green, and I have actually finished on a high note. Great!

Well, actually I have not finished as any TDD-savvy would quickly point out. I have not refactored away duplication and other code smells yet. But doing so will satisfy another part of my ego, the code aesthetics lover that resides inside me. So, satisfying that part of my ego will end the job on a yet higher not.

OK, to be brutally honest: are there no part of TDDs "red-green-refactor" that have the same "can only go wrong" drawback as traditional test-last? Well, there actually are: the refactoring. At best it says: you did not mess up, at worse: you think you where so smart but you failed. So, how come I do not tend to neglect refactoring?

Well, I have already given the answer: I love refactoring, if nothing else for the pure love of nicely crafted code. But also, the refactorings are not only the last cleanup of one part of the job. They are most often also the stepping-stone from which I launch into the next part of functionality to write - so I get immediate feedback from the refactoring as well.

I think writing tests in general, and unit tests in particular are really important. Considering how many that agree, and how seldom it is practiced, we must look for explanations: ignorance, arrogance or malice do not suffice. To see how we experience "writing tests" has helped me to shed some light over the mystery.

Whatever, I think one of the drives for me to do TDD is that is make all parts of the development cycle to be fun.

Yours

Dan

Wednesday 13 February 2008

Transparancy of Business Value

Dear Junior
The story-format from Behaviour Driven Development is wonderful in many ways. I have not yet found another format that is even as close to the semantic density of
As <role>
I want <feature>
So that <benefit>.
However, I would like to dive into one particular aspect.
If you have three things collected it is often interesting to focus on two of them and pay less attention to the third. During my student years there was a joke that it was easy to be well-dressed, social, and sober - but you had to pick two out of three.
In this case we have the role, feature, and the benefit. Please play the game of focusing out the role or the feature. I want to dive into what happens if you fade out the feature itself and focus on the role and the benefit. Benefits can be measured in millions of ways, but let us make it easy for us and convert it to monetary business value. If you need help in doing that, please contact closest economist, they have a broad selection of models for company value calculations to pick from.
"As <role> (...) so that <business value>": If we look at the last part, it is a value experienced by someone - and that can be converted to money. If we look at the first part, it tells us which group will benefit. Multiply the size of the group with the value per person, and you have the potential strategic business value of the story.
As an example: last fall I discussed this during a mountain walk with Bryan Basham who work for StillSecure on the Cobia system for configuration of network nodes (e g routers). One of their stories read roughly "As a network admin I want to apply a setting to a group of network nodes, so that I do not need to configure each of them manually". So, why should a customer pay for this functionality? Well, manual repetition of the same configuration converts really easy to cost of labour hours. So, the benefit of having each network admin saving a few hours each month transfer through multiplication to a monthly reduction of costs, which through not-to-advanced economical calculations can be transferred to a now-value. Voila!
The benefit of actually putting a figure on the value of the story is that you can get clearer transparency of the business value through the organisation. When the product owner makes priorities she can have a look at the estimated value as well as the estimated cost (number of story points we will spend on implementing the story). Another way to use it could be to clear out communication with external stakeholders. E g, a seller might have gotten his will through by being the loudest shouter. However, when he is forced to set a figure of the value, he can also be held accountable if he gets the feature, but fail to deliver extra sales. It can also be fruitful for the CTO/CIO to sum up the value of all stories delivered by development in their quarterly report to the CEO. In that way it might become clear that IT development is not just "all costs"; it actually delivers business value, or at least potential value for someone else to realise into actual cash.
Unfortunately, many organisations are not brave enough to see this so clear. For some reason they prefer to see development as a cost which is funded through a limited budget, and which often do not deliver what they should.
Yours
Dan