June, 2009
by Michael Ley
"Look, capacity management isn't rocket science. All we have to do is monitor the CPU / disc / network and when it gets above the safe threshold we'll just upgrade"
How many times have I heard that approach to capacity management expounded by someone, usually someone that has become involved in capacity or performance for the first time?
The difficulty in responding to such a proposition is that, simplistically, the approach seems to work and arguments for adopting techniques such as modelling appears to complicate the process and make it more costly.
Of course there are issues with the approach, like:
But these "complications" are usually swept aside in the enthusiasm for a simple solution.
In my experience the default capacity management approach is to upgrade when utilisation reaches a certain level, say 70%. In fact, the approach is so imbued in management thinking that I have heard some people describe the approach as:
The question is "does this technique do what it says on the tin"? Is capacity management really that simple?
So let's go back to the beginning, how did the 70% utilisation threshold arise?
In the dark and distant past (not really all that distant, but in our business anything that is more than 5-years old is looked upon as almost ancient history) the mathematicians came up with a series of formula that describe how long someone would wait in line to be served by a server (in this context the word server is generic and can be a checkout in a supermarket or a CPU in a computer system). The mathematicians called these formula "queue theory" and they can be used to predict computer response time.
Now being mathematicians, they were quite precise about defining the queue systems they were solving and devised a shorthand (Kendall's notation) to classify them. The most generally applicable queue goes by the snappy title "M/M/1/∞", which is shorthand for:
Within these constraints you can then derive a formula for the "stretch factor", a curve related to the utilisation of the server that predicts how much your basic service time will be extended because other people are ahead of you in the queue (see figure 1). Thus, in this situation, queue theory predicts that at 50% utilisation if you want one second of service time you will take two elapsed seconds to get served.

Later, when the mathematicians were asked where was the optimum point to be on the curve they defined the "knee in the curve". I seem to recall my old math professor defined this as the point on the curve where you wait in the queue twice the service time. In an M/M/1 curve this happens at 66.6% utilisation, which is where he told me we get the 70% threshold from.
So is there then, a mathematically validated utilisation level which we should strive to run our systems below? Not quite.
Remember all those precise definitions I mentioned earlier, well what happens if they are changed? Well as you might expect the curve changes.
Say you increase the number of CPUs from one to four then you are changing the number of servers from 1 to 4. This pushes the knee in the curve to the right (it occurs at a higher utilisation level) and the breakaway is more dramatic when it comes (see figure 2).

Alternatively, if you know there will only ever be 5 people who will compete for the server (this could be the number of people in a small branch office) then the curve flattens and the stretch off to infinity at 100% utilisation never happens (see figure 3). This is because the most you can ever wait is five times the service time i.e. the four other people get in before you and then your own service time.

Yes, the truth is that when the mathematicians were asked about the knee in the curve they came up with a precise answer for a set of precise circumstances but unless your real environment replicates those circumstances exactly the 70% threshold is likely to be non-optimal.
Furthermore, even if the knee is in the correct place for your system the amount of queuing may not meet your service level requirements. For example at 70% utilisation M/M/1 predicts it will take about 3 units of elapsed time to get 1 unit of service time. Now when you are dealing in milliseconds (CPU time) multiplying the service time by three is unlikely to be noticeable to the end user. However, if you are dealing in seconds (multiple disc reads) multiplying the service time by three will be noticeable, in which case you may want to run with lower queuing levels.
Alternatively, increasing the speed of a server gives you the opportunity of running at higher utilisation levels without impacting service levels. For example, say the response time service level is just being met when you are running at 70% CPU utilisation and the solution is to double the CPU speed. Simplistically, the CPU service time will now halve and, according to the M/M/1 formula, you should be able to run the new CPU at over 80% and achieve the same level of CPU response. Thus, doubling the CPU speed may more than double your usable capacity.
Yes, no matter how much some people would like it, the idea that there is a single utilisation level that tells you when you need to upgrade your infrastructure is a fallacy. Worse it has hidden costs that make the approach inefficient.
So are there alternatives?
Within the technical arena, the measurement of queue lengths is a more accurate measure of delay incurred. There is an old truism, your response time is the time you wait for others who are ahead of you in the queue to get serviced, plus your own service time. By counting the number of people in the queue you have a direct relationship to response time, which intrinsically accounts for changes in the number of servers or size of user population.
This can be seen by looking at figure 2. By taking a constant stretch value, which equates to queue length, the changing utilisation required to achieve this level of response is automatically determined.
Of course, even with this measure there is no agreed definition of what is good and what is bad. The acceptable queue length threshold should be set to meet your service level targets. In the absence of these, my old math professors' definition of the "knee in the curve" (see above) may serve as a starting point.
Unfortunately, queue length measurements are not the complete answer and are not as readily understood as utilisation. A better and more understandable approach is to measure response time. This gives a direct measurement of the user experience and relegates utilisation to that of a supporting technical measure.
Does the availability of these alternatives mean we can move away from utilisation as a measure in the near future? That's unlikely. Utilisation is a simple tool to apply that is widely, if not correctly understood, by managers. Further, utilisation thresholds are seen as a "risk mitigator", something that provides a buffer against the unexpected. However, if a utilisation threshold is used to manage capacity, the aware IT manager must understand that:
Truly, the 70% utilisation threshold should be seen as no more than a general ROT (Rule of Thumb).