February, 2010
by Adam Grummit
This article is the last of a series of five based on extracts from Capacity Management: A Practitioner Guide which was published by Van Haren earlier this year. The particular conference drinking theme of this article was developed with enthusiasm early in the history of UKCMG and cultivated at CMG where it flourished and was awarded best capacity planning paper (some time ago).
It attempts to provide an insight into the issues involved in creating mathematical models of the behavior of computers without delving too deeply into the underpinning theory.
There are numerous articles and white papers available on the subject of capacity management. They tend to fall into three classes. There are the mathematically oriented ones that discuss the underlying principles of queuing theory and related statistical techniques. There are the business oriented ones that discuss the philosophy of governance and process management. Finally there are the pragmatic ones with discussions of practical experience on particular domains or the latest new 'hot topic'.
Papers on theory can very quickly develop into a mathematical treatise with elegant equations and abstruse algorithms for the few. Papers on process definition can very quickly develop into an unending set of checklists of guidelines and project management that can give middle-management a bad name and lose the interest of the experienced practitioner. Papers on current experience tend to be very site and domain specific and they age rapidly.
Capacity Management: A Practitioner Guide tries to give a reasonable appreciation of capacity management from all three perspectives. It takes an understanding of IT service management (ITSM) processes as a starting point and expands on the specific issues raised when implementing or improving capacity management practices. It introduces elementary arithmetical approaches to analyzing data.
This article focuses on modeling. However, it is not a rigorous mathematical treatise nor is it intended to be taken too seriously (note the litotes, if you read number three in this series of five articles).
COCS model
Modeling concepts can be introduced formally as in various learned texts or via a simple introductory evolving model, as follows. The model described is defined as the COCS model - Computer Optimization Client-Server model. Or maybe it applies at some computer conferences, where it might stand for 'Conference Organizing Committee Services'.
Let's start with a simple outline and a key question at such events such as - 'How long will it take on average to get a drink at the bar tonight?' Various laws will help to provide the answer to this and related questions. In the 'Leptokurtosis' article four in this series, Little's Law, the Utilization Law and Response Time Law were introduced. The answer to this first question is provided by Little's Law:
Little's Law states that:
If a number of customers N are observed in a queue in a stable system and served in a period of time T and there are an equal number of arrivals A and leavers L, then, throughput is defined as X = A/T = L/T
Response time, the long-term average time spent resident in the queue and being served is R and R = N / X
This requires just two performance metrics, the number in the queue and the throughput to derive a performance time. This is important for situations when response times are not available from the instrumentation.
But a more basic question is 'How long will it take me to get a drink tonight?' The answer depends on both the queuing time and also the service time (the time spent negotiating the response from the service point for the particular drink required).
Little's Law again will give an average time, but my question needs a particular answer, based on my own selection of drink and the time it takes to serve it as Response time = Queuing time + Service time. Service time is a function of the relevant device characteristics and the queuing time is a function of contention and utilization. Analysis of response time requires the use of further laws, and the concept of M/M/1 models which were introduced in the 'Leptokurtosis' article in this series. Such models assume that the gaps between the arrivals of transactions are exponential[1] and so too are the service times. Think of a rapid, enthusiastic drinker of beer picking up a pre-filled tankard (short service time), downing it quickly (short think-type time) and renewing it frequently (short inter-arrival gap at the bar) compared with a cultivated port drinker choosing the best vintage, sipping it slowly and renewing it infrequently. Think also of the average waiting time for a deterministic metro service that runs every ten minutes (simple answer based on various assumptions is 5 minutes). Then think of the average waiting time for a random taxi service at a place where, on average, six pass by each hour (simple answer based on various assumptions is 10 minutes).
M/M/1 models
Queuing models are generally constructed to represent the steady state of a queuing system, that is, the typical, long run or average state of the system. As a consequence, these are stochastic[2] models that represent the probability that a queuing system will be found in a particular configuration or state.
For M/M/1 models the following is true: For M/M/N models the following is true:
Response Time Law: R = S/ (1-U) Response Time Law: R = S/ (1-UN)
QL = U/(1-U) and QT = S * QL
The M/M/1 notation was derived by Kendall and is a simple case of the general expression A/B/s/q/c/p where:
A is the arrival pattern (distribution of intervals).
B is the service pattern (distribution of service duration).
S is the number of servers.
Q is the queuing logic (FIFO, LIFO, RR...) omitted for FIFO.
C is the system capacity, omitted for unlimited Q (open).
P is the population size, omitted for open systems.
A and B in this expression are frequently assumed to be M, but there is a range of sample values for arrival and service distribution characteristics:
M is Poisson (Markovian) process - exponential distribution. This is 'memory-less' where, for example, the inter-arrival time does not depend on previous inter-arrival times.
Em is Erlang distribution of intervals or service duration.
Ek is Erlang k-distribution of intervals or service duration
Hk is k-stage Hyperexponential distribution of intervals or service duration
D is deterministic arrivals & constant service duration.
G is general (any) distribution.
GI is G with independent random values.
Open and closed systems
In an open system, the customers are allowed to enter or leave the network. These are also called infinite population models since there is no limit to who may come in the door, and the actual number of people present is unknown. The requests over the network on a remote client-server system may be modeled as open systems. In a closed network, the number of customers is constant in the network at all times, hence they are also known as finite population models. In practical terms, batch and interactive systems are often modeled as closed systems.
One of the frustrations waiting in the queue is the apparent inactivity of the barman. His utilization is readily defined:
In a period of time T, the barman is seen to be busy for a total time B.
There are L leavers in the period and throughput is defined as X = L/T and utilization is defined as U = B/T
Mean service time at the bar is defined as S = B/L and the rate of service = 1/S and the barman is busy for X.S of the total time
But U = B/T and B/T is the same as (B/L) . (L/T) thus U = S.X
That is the utilization law: server utilization, U = S.X
Multiple server model
The model discussed so far is based on a simple single server. Returning to the conference rinks analogy, modern techniques apply in some conference hotels, where a separate server is used to take the money at the till. Or even to have two separate tables with pre-poured wine and beer with a front man taking the request, providing the result and collecting a token on completion. So the model has to show the network of these multiple devices, each with their separate queues.
The flows of customers between different servers in a network are governed by the Forced Flow Law. Assume that in an observed interval, 10 customers are served and there are 30 less bottles on the beer table and 20 less on the wine. The system throughput is 10 and the individual servers have throughputs of 30 and 20. Each transaction has a visit count of 3 beers and 2 wines and each throughput can be expressed as the visit count times the system throughput.
Forced Flow Law: The throughput for any device is found by multiplying the throughput for the system by the number of requests at that device.
Another case of multiple servers is a single queue with multiple servers, called for example M/M/4, as in many post offices or quad-processor machines. Forced flow is not relevant here - but then this model is not used in many bars (perhaps sadly, for the smaller, less aggressive of us).
The main factor not considered so far, is the most enjoyable. The 'drink time' or 'think-type' time is independent of any contention and is determined only by the residence time at the particular consumer. This can be represented in a model by a 'delay server' where each customer has their own server and so there is no competition for service.
Multi-class model
The model, so far, assumes the barman makes no distinction between the customers, but life isn't like that. There is a need for a multi-class model to introduce priorities: committee members have special rights to submit priority interrupts and get better treatment. First-time delegates don't know the rules of the game and are second class citizens by definition, getting serviced on a time-sharing basis. The main body of experienced delegates works through the normal interactive systems. Bulk deliveries of replenishment stock tend to be actioned in batch well in advance of the busy time (hopefully).
So multi-class modeling is necessary for the different levels of service and the scheduling algorithm used by the particular barman. The scheduling algorithms used can vary, with the main ones being:
Computer modeling also caters for the high profile demand that always gets its request in first - (especially when I'm in the queue) - for a genuine emergency (system interrupt).
Model solution methods
These problems can be solved using convolution[3], mean value analysis[4], decomposition and aggregation, and other mathematical techniques and are 'exact and efficient' when applied to the models which can be reduced to a 'solvable form' (that is, 'product form models'). These terms are all within the standard mathematical vocabulary and typically well described within Wikipedia. But in real life there are detailed practical variables that impact on the situation, such that pursuing extensive mathematical solutions can be counter-acted by more simple approaches.
Consider the barman and his stock of immediately available services (such as beer and house wine) but with more specialized drinks in the cellars - with maybe different cellars for different wines and vintages. So when there are requests for a specific fortified wine such as Madeira then the barman has to decide which cellar may have it, and the service time taken depends on the level of his expert knowledge and the efficiency of the path length taken to find the bottle.
Some people like to model the service time in great detail - the time taken to lift the glass from the rack, to pull the lever or open the bottle, to fill the glass, to push back the lever or recork the bottle, to wipe off any spillage or froth and the time to push the glass across the hatch - followed by all the detail of the end of transaction negotiation, with passing of tokens and updating of registers.
However, all this detail is often unnecessary, since the mean time to serve a glass of wine or pint of beer is not so variable and is dominated by the pouring time and final negotiations. So the model could simply reflect these two functions and remain within say 10% (or better) tolerance accuracy.
Workload components
Resource demands and arrival rates tend to be highly variable - but can readily be grouped based on type. For example, the arrival rate is related to the percentage of the population concerned and also to their drinking time (also known as 'think-type time'). The number of visits to each server is also a significant variable. Beer drinkers may have a high arrival rate, high visit count and a short service time. Wine and spirit drinkers may have a lower arrival rate, lower visit count and a longer service time.
If think-type/drinking time = TTT and response time = R and number of delegates = N then the throughput is the number of delegates divided by the response time plus the think-type/drinking time. C = N/(R + TTT)
Parallel processors
Returning to the question of the manning of the bar, different bars use different systems to try to add value to their services. Some just stick to one very efficient and friendly processor of requests. Some add a second, who may work beside the first in an equal way sharing the load, or one may take the requests and feed some on to the other. Some add a second, but it is only loosely coupled to the first by, say, a fast hatch between two adjacent bars. Or more frequently these days, multiple processors share a single slot on the mother-board and are multi-core or multi-layers stacked to achieve greater throughput. Some have a whole row working side by side, with all requests at the bar decomposed into single drink requests to enable parallel working - and some massively parallel processing where 64 or even 128 barmen work side by side. That's what I call a bar - and was thought, at one time, to perhaps be the way for the future.
Client-server and n-tier systems
However, nowadays, more exotic techniques have evolved, known as client-server and n-tier systems. The local network of a group of friends at a single table, with a crude but effective communications mechanism (talking), can make their requests to a single person (who should be the largest to ensure the best service at the hatch, but tends to be the smallest because nobody wants to waste time on overheads like going to the bar). This end-terminal server can then make requests at the bar. If there is a single hatch, he has to wait for each request to be serviced. But if there is more than one server at the bar, and he can communicate with them, he can multi-program his demands. The bar has only a certain amount it can provide locally, and has to send off requests to the cellar for other functions. Fino sherry requests, for example, may require a request to a remote server in a central wine cellar.
The problem comes with the detail of networking to be modeled. The communications between the drinkers on the table, and the time it takes them to allocate the drinks once centrally received are negligible in the overall picture. But the communications between the end terminal server and the main processing server and the time between the main server and necessary accesses to its master cellar (or different cellars with wines, spirits, fortified wines et cetera) are a significant part of the total service time.
So the answer to the question is now: response time = time in network to joining queue + time in queue + service time + time in network to close
However, the service time is now more complex too, since it incorporates the server's own processing time and also the time of further server demands' responses. The further server demands are those where access is made to remote cellar(s) whose response time also consists of network delay, queue delay and processing time. These more complex situations can be modeled using combinations of established mathematical techniques such as convolution and Laplace transforms[5].
Various mathematical approaches have been developed to address the issues concerned with parallel processing and with virtual machines. Equally, given the complexity of end-to-end architectures, many solutions are now being developed on simpler approaches to give less accurate analyses which are, nonetheless, more than adequate for business decision support. These are mostly aimed at providing rapid answers on an automatic basis that can be readily applied to a large number of servers.
[1] Exponential equation y = e-x
[2] A stochastic process has a non-deterministic behavior in that a system's subsequent state is determined both by the process's predictable actions and by a random element so that the outcome for a given starting state is not always the same.
[3] Convolution is a mathematical operation on two functions, producing a third function that can be viewed as a modified version of one of the original functions. Convolution is similar to cross-correlation. It has applications that include statistics, signal processing, electrical engineering, and differential equations.
[4] Mean value analysis is a technique for computing expected queue lengths for a closed system of queues, developed by Reiser and Lavenberg in 1980.
[5] The Laplace transform is one of the best known and most widely used integral transforms, perhaps second only to the Fourier transform in its utility in solving physical problems. It is commonly used to produce an easily solvable algebraic equation from a linear ordinary differential equation. It has applications that include mathematics, physics, electrical engineering and probability theory.