Friday, 17 July 2009

Canonical Data Format for Domain Model Data

Dear Junior

When accepting data from a user, it can come in variations of forms, and the same data can come in different forms. E g, a phone number can be sent as "0709-158843", "0709 158 843", or "+46(0)709158843", all denoting the same thing - the phone number to my mobile phone. This can really be a hassle - say that one user stores the phone number as "0709 158 843" an another searches for people with phone number "0709-158843"; then there is a risk that there will be no match, even though it is actually the same phone number.

This situation can get even subtler: non-7-bit-ascii characters like the Swedish letters å, ä, and ö can be represented as bytes in several different ways. Thus, there is always a risk that the search-form does not use the same encoding as the storage in the database. The result might again be "no match" even though the two users (one entering a name like "Öberg", another searching for it) punched exactly the same keys in the same sequence, perhaps even using the same physical keyboard.

To get around this, I usually decide a canonical data format to be used in the model, which naturally also will be in the code that implementing the model. And now I do not mean technical issues like using Unicode character encoding, those are up to the programmers to decide. I mean what a phone number looks like "in its own nature", so to say "beneath different representations".

What a phone number should look like is a domain modelling issue, and should be decided together with the domain experts (user representative, product owner, whoever ...). It is often pretty efficient to bring up the problem explicitly: "We need to set a standard for what phone numbers should look like, otherwise the application will look like a mess, searches will fail, and integration with other systems will be a nightmare. I have a few suggestions for possible alternatives: is there anyone you like better then the other, or are there another format you would suggest?" Probably you will come out with something like "0709-158843", which can be abstracted to the regexp "0ddd-dddddd[d]*" or similar. This is your canonical form, and will in Domain Driven Design terminology be part of your domain model.

When choosing your canonical form and how to represent and store it, you must obviously take a look at the system as a whole. Probably your choice will be guided both by functionality you want to provide as well as system qualities (NFRs) such as capacity, response time, and security. E g if you are managing a forum site with discussions that are mainly held in only in English, you might restrict comments to only contain A-Z, a-z, some white-spaces and some punctuations. If you have several languages, but mostly west-European, you might not be able to restrict the ranges, but can store it using UTF-8 (for storage capacity), if there are a lot of other you might use UTF-16. Finally, if the content is sole for publishing on web and it can contain characters like '<', you might HTML-encode it for security reasons.

After deciding canonical form and its representation, it will be the responsibility for any indata handler to validate incoming data and convert it to the canonical form. The logic for doing this can preferably be put into a value object class (PhoneNumber) so that it is easily found and used. Then the rest of the application can safely use (as field declarations, variables, arguments and return values) this type, making the rest of the code more precise and expressive.

Also, by default all data presentations (i e output) will be on this format as well. If there are presentations that need another form, the responsibility falls on them to convert to the format needed. E g some listing might want all phone numbers to be structured as "0709 [tab] 15 88 43", then it is up to that listing to convert the phone numbers to that format. In the same way, if some presentation needs a specific encoding (UTF-16, Base-64 or HTML-encoded), it is up to that presentation tier to do the conversion.

In this way, the life within the model becomes simple, searching and matching can be done with out trouble. At the same time, we can facilitate all the input and output formats we need by pushing coding, and conversions towards the system boarder.



Wednesday, 1 July 2009

Not Enforcing Architecture

Dear Junior

As I often enough take on the role of software architect (whatever that is), I am from time to time asked by other architects how I do to enforce architecture, or certain patterns, or coding conversions. They seem to see a problem that they spend a lot of energy and effort in defining these rules or guidelines, and developers in general seem not to follow the rules, so how to enforce them?

The question always takes me aback. Besides really not recognizing the scenario, I really have some trouble with the question itself, especially the usage of the word "enforce".

To start with, if you believe in that there is something like "free will", you cannot enforce anybody to do anything. From a philosophical perspective anyone is always in a position to say "no" - of course taking the consequences, but still refusing to be forced. The only way to really force someone is to ensure that the consequences are catastrophic, so that refusing is not an option. Total annihilation of mankind would be such a scenario. Or, from a personal perspective, pointing a gun to someone's head would suffice in depriving them their free will, thus enforcing some behaviour.

Well, pointing a gun to the developer's head will probably not be a feasible way to enforce some coding standard. In particular it has the drawback that you can only enforce it at one programmer at a time, or two if they are seated within arm length distance and you have two guns. Apart from that, I think it would be illegal in most countries I have worked.
So, the programmer must follow your architecture out of free will taking into consideration the consequences of not adhering. Let us get back to this later.

Secondly, the word "enforced" is normally most used in the context of "law enforcement" or "enforcing legislation" referring to the power of police, courts, and the rest of the system of justice to uphold the rules of society - which is usually considered to be a good thing. Let us pause for a moment and consider why this is considered good, even though it is a restriction of our freedom of choosing our actions. The legitimacy of the legislation in our democratic societies derives from the fact that it has been decided in an open process after careful discussion and debate between delegates elected to represent the people being governed by the laws.

So, how often does this structure apply to the relation between the architect, the architecture, and the developers? Is the architect elected by the developers in free elections, and the architecture debated and voted upon? In most of the cases when the architect asks for advice on enforcement, the answer to these questions is always "no" in my experience.

If we are to find a government system that resembles the structure of most organisations with architect and developers, it is despotism that is closest. The despot makes decisions he (most of the cases) consider wise, the subjects have no say in the question and must obey. Of the different kinds of despotism, the situation is actually best matched by a kingdom where power of the king stem from a higher power (i e a God in historic societies, and Upper Management playing that role in development organisations).

I am sorry, but if someone asks me for advice on how to enforce legislation in a system of one-ruler dictatorship - I will simply discretely not cooperate.

Back to the free will and making people follow rules. If you do not point a gun to their head, the reason for developers to follow "your" architecture is the consequences of not doing so. The architect might get angry at them, they might have an unpleasant discussion with the boss, or they might eventually be fired. Basically, they will follow the architecture out of fear. And, having seen quite a few teams in various situations I can tell, fear is not the best motivator for producing high quality software.

So, at the bottom line: I will not enforce architecture. I will not help enforcing architecture. I think it is a bad idea to enforce architecture. I think that if you want to enforce architecture you are working in the wrong direction.

Do not enforce architecture. Do not even consider doing it. Find a better way instead.