Wednesday 7 March 2012

DRY and Duplicated Code

Dear Junior

In my last letter I wrote about how the DRY-principle (Don't Repeat Yourself) talks primarily talks about the number of times an idea is expressed in the code-base, not about text-repetition as such.

If nothing else, the intention of DRY becomes clear when you go back the source that caused the DRY-meme to spread within the programming community: the Pragmatic Programmer by Andrew Hunt and David Thomas, published in 1999.

"Every piece of knowledge must have a single, unambiguous authoritative representation within a system" 

(Needless to say, when reading this book I found its core ideas to be extremely well aligned with my own ideas about programming - and the rest of the book have had profound impact on my view of the craftsmanship aspects of programming)

Now, to put my last letter in perspective it must be pointed out that the absolutely easiest way to find violations of the DRY principle is to search your codebase for duplicated text. It is very likely to find multiple places where the same idea is expressed over an over again using the same technical construct.

If for example you have an order system, there might be an attribute "payment" on the order telling when the order was paid.

class Order {
   Date payment = null;
   Date payment() { return this.payment; }

Most probably orders are not allowed to be sent for shipping unless they are paid. So, very probably there will be code somewhere checking this condition.

if(order.payment() != null) // do shipping
else // do not allow shipping

It is also probable that the same check is done in a lot of other places throughout the codebase. What we have ended up with is that the same technical construction "order.payment() != null" is used to express the same idea "order is payed" in a lot of places - a clear violation of DRY. Most probably we will also find a lot of places checking the negation "order.payment == null".

Of course the codebase would benefit from encapsulating the interpretation of a nulled payment-field and putting that encapsulated interpretation in one place - preferably the Order class.

class Order {
   Date payment = null;
   Date payment() { return this.payment; }
   boolean isPaid { return order.payment() != null; }

The benefits are several. To start with it is now possible to change the technical representation of an unpaid order without having to run around the code-base. We actually want all those places in the code to change at the same time - because they conceptually related things (actually they are conceptually the same thing).

An even more important benefit is that the API of the Order class becomes more expressive, making the client code more lucent.

if(order.isPaid()) // do shipping
else // do not allow shipping

Obviously, searching for duplicated text is an excellent way of finding candidate violations of DRY. If the code looks exactly the same, then those places will need to change at the same time - i e we have implicit couplings in the codebase that we risk breaking when we change.

There are excellent tools out there to detect duplicated code - some bundled in code-quality tools like Sonar, some available directly in your IDE. Most of these even have support for "fuzzying" where they detect that code is duplicated "apart from different naming of variables" for example.

Using these tools on an "average codebase of the street" will probably yield hundreds of places with duplicated code. Of these candidates, the vast majority of them will be violations of the DRY principle - especially the short code-snippets like "order.payment() == null" which are so easy to "just write" instead of looking for a method that does the job.

However, be aware that within that huge pile of DRY-violations there might also be a few "false positives". Those are the code-places that by accident happen to look the same way, but actually represent different and unrelated ideas. To "reduce the common code" in that case would only cause a irrelevant coupling between unrelated parts of the solution.

In summary, looking for duplicated code is an excellent "screening test" to find loads of candidates where the DRY-principle has been violated. However, the DRY principle is not about duplication of text, but about duplication of ideas. So, the two should be kept apart.

Even if duplicate code most often is a sign of DRY-violation, there are valid reasons to have the same code-text in two different places. 

Jérémie Chassai phrased it pretty well on Twitter: "You should repeat yourself if you say the same thing for two different reasons".

Get those tools, start searching for duplicates, and good DRY-hunting