Friday 27 August 2010

Two Types of Performance

Dear Junior

In architecture one of the most important tasks is to keep an eye on the non-functional (or quality) attributes of the system. Often this importance is enhanced by stakeholders holding up some non-functional requirement (NFR) saying "the system must really fulfill this". Unfortunately, these NFRs are often quite sloppily formulated and a key example is "performance". I have stopped counted the times I have heard "we must have high performance".

I think a baseline requirement for any NFR is that it should be specific. There should be no doubt about what quality we talk about. In my experience "performance" can mean at least two very different things. Instead of accepting requirements on performance I rather try to reformulate the NFR to use some other wording instead. I have found that the "performance" asked for often can be split into two different qualities: latency and throughput.

Latency or Response Time

With latency or response time I mean the time it takes for some job to pass through the system. A really simple case is the loading of a web page, where the job is to take a request and deliver a response. So we can get out our stop-watch and measure the time it takes from the moment we click "search" until the result-page shows up. This latency is probably in the range of 100 ms - 10 s. Of course, this response-time is crucial to keep the user happy.

But latency can also be an important property even without human interaction. In the context of cron-started batch jobs it might be the time from the input-file is read until the processing has committed to the database. The latency for this processing might have to be short enough so the result does not miss next batch downstream. E g it might be crucial that the salary-calculation is finished before the payment-batch is sent to the bank on salary day. 

In a less batch-oriented scenario the system might process data asynchronously, pulling it from one queue, processing it, and pushing it onto another queue. Then the latency will be the time it takes from data being pulled in until the corresponding data is pushed out at the other end.

All in all, the latency is seen from the perspective of one single request or transaction. Latency talks about how fast the system is from one traveller's point of view. Latency is about "fast" in the same way as an F1-car is fast, but will not carry a lot of load.

Throughput or Capacity

On the other hand, throughput or capacity talks about how much work the system can process. For example, a news information portal might have to handle a thousand simultaneous requests — because at nine o'clock coffee break a few thousand people might simultaneous surf to that site to check out the news.

Throughput is also important in the non-interactive scenario. Each salary-calculation might only take a few seconds, but how many will the system be able to process during those 10000 s between midnight (when it starts) and 02:45 when the bank-batch leaves. If we cannot process all 50 000 employees, some will complain. To meet the goal we need a throughput of five transactions per second.

In other words, where latency was the performance from the perspective of one client or transaction, then throughput is the performance seen from the perspective of the collective of all clients or transactions, how much load the system can carry. Here "load" in the same way as a bus will take a lot of load transporting lots of people at once, even if it is not fast.

Fastness vs Load Capacity

Both F1 cars and heavy-duty trucks are no doubt "high-performing" cars. But they are so in completely different ways. To have a F1 car showing up at the coal mine would be a misunderstanding that only could be matched by the truck at the race track.

So, I avoid talking about "performance" and risk misunderstanding. Instead I try to use "latency" and "response time" to talk about how fast things happen — while thinking about an F1 car.  And I use "throughput" and "capacity" to talk how much load the system can handle — while thinking about a bus full of people.

What is the latency for transporting yourself between Stockholm and Gothenburg using a F1 car or public-transport bus? What is the throughput of transporting a few thousand people from Stockholm to Gothenburg using an F1 car or public-transport bus?



P s Now when we are moving to multicore, we will see increasing capacity but latency leveling out or getting worse. This in itself will be a reason to move to non-traditional architectures, where I think the event driven architectures (EDA) is a good candidate for saving the day. My presentation at the upcoming JavaZone will mainly revolve around this issue.