mBrace- Part II
A pragmatic model driven approach for Performance Engineering of transaction processing systems
by Michael Kok
|About the Author|
Part one of this paper
Part one of this paper about the mBrace approach was published in the November issue of MeasureIT and shed light on the way the model assists in securing system performance. This second part gives an overview of some of the internals of the approach: the main features of the model, the way it is constructed and validated; the data that it uses and the way the data is collected. The model plays a central role in the approach.
The above figure shows the dashboard of the model used. The dashboard consists of three sections: 1. Parameters and outcomes. The parameters include transaction volume and resource capacities. 2. The graphic with a response time breakdown for each transaction type sorted by decreasing response time (the DNA profile). 3. The graphic showing utilisations of all resources of the infrastructure involved.
The two graphics tell us everything there is to know about the performance of the information system. They visualise what happens to response times and utilisations when the transaction volume is increased. The parameters for the resource capacities can be used to show the effects on response times by horizontally and / or vertically scaling of the resources.
This originally was going to be a two-part paper, but due to the amount of material being covered, it was decided to turn this into three parts, with the third and last part being published in an upcoming issue of MeasureIT. The third part will treat the subjects of Measuring, Validating and will include a series of real life examples of using the mBrace approach.
Measuring and modelling
The simulation model that describes response times and resource utilisations has a central place in the mBrace approach. The example in the first part demonstrated a model that was filled with measured data. Creating such a model is not a matter of modelling or measuring; instead it is about the careful balanced combination of measuring and modelling. What and how we measure is determined entirely by the model and modelling is greatly influenced by the possibilities and limitation of measuring.
The model shows response time breakdowns of transactions at varying transaction volumes and infrastructure capacities. It also shows the utilisations of the resources, which enables us to evaluate capacities. The model is initially created with the possibilities of measuring in mind because there is no sense in creating a model for which we cannot obtain the measurement data. Subsequently, a measurement plan is made and implemented in conjunction with the model. It is the model that determines what data we must collect. Concessions may have to be made to the accuracy of the model by the limitations of the data collection. The resulting combination of model and data is extensively checked and validated in two steps.
Systems produce a lot of measurement data. The variety of data types and the amounts produced are mind-boggling. We hardly know what to do with them. As soon as we have our model this problem is completely solved. All we need is the input for the model. On the other hand, once we try to use a model and fill in data from standard measurement functions almost nothing seems to fit. How come we rarely collect a sum of squares for interarrival times, CPU usage or for storage accesses? How come a transaction concept is seldom implemented in a useful way and transaction-processing middleware rarely produces adequate measurement data on a per transaction basis?
We must keep in mind that the mBrace approach initially focuses on the stage of commissioning of an information system. Just doing the measurements is not sufficient there is more information to be collected. The performance analyst also needs to collect information from the business that intends to use the system. From this measuring is seen as part of the larger concept of data collection.
The following section provides more insight into Modelling. An overview of the construction of the model is presented in a way that avoids as much mathematics as possible.
Modelling is a technique that unfortunately is applied far too seldom. It seems to require a rare skill. In elementary education we learned about the greengrocer who sells apples. And the first models are of the kind: one apple costs 0.25, how much does a box of 20 apples cost? In Dutch these might be called the Bartjens-type models (Bartjens, 1584 - 1645, published a text book on arithmetic for primary education). Though 90% of the time our performance models consist of this kind of simple stuff - one disk access takes 4 milliseconds, how much time does a complete transaction with 35 accesses take for disk time? - it still seems to be a difficult occupation.
The model of the mBrace approach applies too much of the Bartjens type of modelling and a bit of queuing theory to calculate the waiting times in the response time breakdowns. Nonetheless queuing theory is an essential element. Only simple and rough queuing models are applied. Queuing networks are not used. When analysing system performance with the overall model, all waiting times are calculated using simple elementary open queues with rough approximations for non-negative exponential arrivals and service times. If for a certain single setting more accuracy is desired, then accurate queuing calculations may be done separately. This approach seems to be sufficiently accurate and it is relatively simple to implement. Remember: mBrace is a pragmatic approach!
2.1 The overall performance model
The basis of the overall performance model is the response time breakdown. Figure 2 shows the infrastructure of the example used in part one of this paper and figure 3 shows the corresponding response time breakdown.
Such a response time breakdown departs from the idea that a transaction spends time (latency) at the resources that do the work. If these resources are busy, then the transaction also spends time waiting for the resources.
Thus in this basic model the time spent at each resource is split into the time needed by the resource to process the transaction (the service demand) and the time the transaction must wait until that resource is ready from servicing concurrent transactions. The rs represent the times spent at the resources, the ws represent the waiting times caused by queuing.
Figure 3. The time spent at each resource consists of resource and waiting times.
The response time breakdown reflects the full infrastructure chain of the information system (TS stands for Terminal Server). A response time breakdown is maintained in a transaction bookkeeping for each transaction type.
Keep in mind that only if the latency at a resource is visible in the breakdown is that resource considered critical and shown in the breakdown. Only critical resources are interesting and have a place in the response time breakdown. After explorative investigation, the LANs and the WAN in this example case are considered non-critical and are left out of the response time breakdown. Figure 4 shows the remaining response time breakdown.
Figure 4. The response time breakdown implemented in the example.
In order to get accurate results, we need to go into further detail. So the time spent at the servers is split into pieces again. For each server, three resources are identified: the CPU, the hard disks or storage and memory. Figure 5 shows the resulting response time breakdown.
Figure 5. The response time breakdown further detailed.
In this implementation of the response time breakdown we can see that the desktop PC and the interfaces to the referential systems (customers, general ledger etc.) have no waiting times. The PC works for one user, has no concurrency and consequently has no waiting times. The interface was only partially investigated. The transport vehicle and the system at the other end of the interface were kept outside the scope of the study. So we only have a single value for the time spent at the interface in the test environment. At higher volumes, the interface time may increase considerably, but this is not shown in the model. If the interface appears to be interesting additional investigation is done.
Having observed this, the question is whether we can determine the time a transaction spends at a resource and how long does waiting for that resource take? In principle we measure the time spent at the resource and we calculate the time spent waiting for the resource. Queuing theory is used for calculating the waiting times for the resources. When the transaction volume increases the response time has a tendency to increase too. This is caused by growing waiting times. The time spent at the resources themselves does not change when volume changes. It is not difficult to see that the amount of processing by the CPUs does not change when other transactions are processed too. The number of I/Os for a transaction does not change when other transactions are processed, the transaction uses the same data.
It is important to understand that the mBrace model may be viewed as an overall model consisting of various sub-models. In the following sections, queuing models may be discussed as sub-models of the overall model.
2.2 Queuing theory
Queuing theory and mathematics
Queuing theory is both a nice instrument to torture students and an essential technique for performance and capacity management. This section explains the way queuing theory is applied in the mBrace model. Care is taken to avoid an overdose of mathematics. The focus in this explanation is rather on the way it is applied.
The elementary queuing system
Waiting times play an important role in response times. When transaction volumes grow, response times tend to increase. The differences are determined a great deal by queuing. A considerable part of response times may be determined by the waiting times for the various resources, especially when growing transaction volumes make their utilisations increase. In order to calculate waiting times, each critical resource is modelled using queuing theory. The elementary queuing system depicted in figure 6 helps to describe the performance behaviour of the resource. From figure 5, we can see that there are nine (count the ws) elementary queuing models incorporated in the overall model.
In figure 6 there is a population of sources that produce service requests at a rate λ per second. Sometimes E(ta), the average time between successive service requests of the sources, is used instead of λ. There are n servers. - Be aware that the term server is used here in the context of queuing theory and do not mix it up with such things as application and database servers. The average utilisation of the servers is ρ.
One of the sources generates a service request. When there is a server that is not busy it starts processing the service request immediately. If all servers are busy then the service request has to wait in the queue until a server becomes available. On average, a service request spends E(tw) seconds in the queue. The server processes the request taking E(ts) seconds on average. The source returns to its original state.
The average response time equals E(tq). n, the number of servers, λ , the arrivals per second and E(ts), the average service time are known values. E(tw) and E(tq) are the values we want to know and they are calculated.
Figure 6. Schematic figure of an elementary queuing system
Mapping the resource on the model of the elementary queuing system
In order to calculate the waiting time for a resource we first map it on the queuing system. For each resource the following questions are answered: What are the sources (or customers or service requesting elements)? What are the servers? E.g. CPUs can be viewed as the servers. The processes of a UNIX machine could be viewed as the sources, they produce the service requests for the CPUs. Then we have to determine what the arrival rate (λ) or the average interarrival time of the service requests is and the average service time E(ts). The interarrival and service times are governed by probability distributions. Apart from their average values, their dispersions are of interest for calculating waiting times. The coefficient of variation squared, CV2, is the measure used to describe dispersion [Footnote: The squared coefficient of variation for service time ts equals: CV2(ts) = Var(ts) / E2(ts). Var(ts) is the variance of the service time ts: E(t2s) - E2(ts).] . When we know these figures for interarrival and service times we can use tools to calculate the average waiting time E(tw)
The curves in figure 7 show two examples of how the waiting time depends on the server utilisation ρ. The X-axis represents the utilisation of the servers of the queuing system. The Y-axis corresponds with the waiting time factor that must be multiplied by the average service time E(ts) to obtain the waiting time E(tw).
Figure 7. Two examples of waiting time curves.
There are many types of elementary queuing systems. Each of the various types of queuing systems has its own curve with a shape comparable to those in figure 7 describing the waiting time. Figure 7 shows the curves for two queuing systems with a different number of servers. As can be observed, a queuing system with 16 servers has lower waiting times than the system with only one server.
There is a useful instrument for characterising the types of queues: Kendalls classification.
The following string is an example of how Kendall classifies queuing systems:
M The arrival process is Markovian, i.e. the interarrival times follow an exponential probability distribution. As a consequence the coefficient of variation for the interarrival times, CV2a =1. This is an important characteristic of the exponential distribution. M The second M in the string states that the service process also follows an exponential distribution. 4 In this example the number of servers = 4. 80 The maximum number of sources that can wait in the queue = 80. 600 The number of sources = 600. fcfs The queuing discipline, fcfs stands for first come first served.
Queuing systems with an infinite number of sources are called open queues in contrast with closed queues that have a finite number of sources.
In the mBrace model, the waiting times are calculated for all critical resources at a certain transaction volume and their values are filled into the response time breakdowns of the overall performance model. This is done for all transaction types within the transaction focus.
Positioning of the queuing modelling approach
The theory of networks of queues is not applied in the mBrace approach. Though this theory is a mathematically elegant approach, it is felt that it does not provide a sufficiently pragmatic solution for the overall model. In the mBrace approach, in fact, we still deal with a queuing network, but this is broken down into elementary queuing systems in (mathematically viewed) an informal way.
Memory is modelled in a layered fashion. A dynamically used memory pool is modelled as an elementary queuing system with a large set of servers. Each block from the pool is considered to be a server. Their service time is determined by the times the blocks are allocated to the transaction being processed. These times are determined by the times the transaction spends at the CPUs, the disks and other latencies caused by interfaces, networks and collaborating servers. All these times may include waiting times themselves.
Tools for calculating waiting times
There is a choice between analytical models and (discrete event) simulation tools for calculating waiting times. Simulation tools have a broader scope of solutions than analytical tools. However, they are less easy to implement. Moreover the processes that do the work in transaction processing information systems are relatively simple (though complicated enough). Therefore it is felt that simple analytical tools sufficiently serve our purposes.
The mBrace approach makes use of simple elementary analytical queuing models for calculations on the fly. Calculating these models takes relatively little processing time and within certain preconditions is sufficiently accurate.
Calculating waiting times per transaction
Since the model produces the response time breakdown of every transaction type, we have to deal with mapping the transactions on infrastructure usage and queuing. As an example, let us have a look at how CPU wait time is modelled. All transactions of the observed and environment applications together produce a statistical population of service requests to the CPUs. In order to get some idea about this we might divide the total amount of CPU reported in a certain interval by the number of interrupts in that interval. This shows that the average service time for CPU might be something like 0.4 msec. It is almost impossible to gain some information about the other statistical parameters of the CPU service time. As usual we assume an exponential probability distribution by default. Such characteristics of this population of service times determine the waiting time for CPU. E.g. at 85% CPU utilisation (1 CPU) the waiting time equals 3.4 msec. This means that each time a process needs the CPU it has to wait 3.4 msec on average. This also means that each time one I/O for that transaction has completed and it needs a CPU to continue processing it has to wait 3.4 msec. E.g. a transaction with 120 physical I/Os, in other words the total CPU usage of that transaction (the only thing we commonly can measure) being interrupted 120 times needs the CPU 121 times. Consequently the total average time the transaction waits for CPU equals 121 times 3.4 msec = 411 msec regardless of the amount of CPU usage of that transaction.
Applying queuing theory
The way queuing theory is applied in the mBrace approach for calculating average waiting time comes down to the following:
- Map each critical resource separately on the elementary queuing system (figure 6).
- Collect the data about the averages and dispersions of service times and arrival rates. If some of these data are not available: fill in the gaps by estimates and assumptions based on experience.
- Calculate the utilisation of the resource from the arrival rate and service time.
- Determine the type of queuing system (chose the curve) using Kendalls classification as a checklist.
- Determine the average waiting time for the resource by picking the waiting time from the curve at the value of utilisation (Figure 7). As an example, the yellow arrows in Figure 7 show that a queuing factor of 20 results from a resource utilisation of 95%. Multiplying the factor with the average service time, E(ts), yields the waiting time.
- Fill the result into the response time breakdown to calculate the total waiting time for each transaction.
2.3 Accuracy of the model
The accuracy of the overall performance model is important. How accurate is possible and how accurate is necessary? This depends on our goals and ambitions. What is it that we want to achieve? We want to give a clear and reliable picture of the performance of an information system to be commissioned. Reliable enough to decide whether its performance stands in the way of a successful rollout or not. Also it should support reliable decision making about investments in the infrastructure.
If we want to know what level of ambition is feasible we have to look at the quality of our measurement data and the sensitivity of our models for that quality. The following factors determine the accuracy of the model:
- The structure of the model that we construct.
- The accuracy of the waiting time calculations.
- The accuracy of the data fed into the model.
Since there are a number of compromises in the models used it is not shocking to learn that there are errors in the outcomes. The only interesting question is: how big are these errors?
A performance analysis should deliver quick results and at affordable costs. When average response times of 3.0 seconds are reported, it should not be 20 seconds in reality, but if they turn out to be 2.5 or 3.5 seconds in most cases this is sufficiently accurate. If an average response time of 200 seconds is reported and this turns out to be a 1,000 seconds, nobody will have a problem, the system is not working anyway. So the mBrace approach aims at fair accuracy at resource utilisations in the range up to 80%. The next sections treat some accuracy related aspects of the model.
2.3.1 The structure of the model
Calculating waiting time starts with the mapping of the real life system on the elementary queuing model. Commonly there are various alternative options to do that with a choice between simple rough solutions and more accurate solutions. The more detailed solutions demand more (accurate) data and may be more expensive for the effort needed. The rough models may be less accurate, but may be cheaper and quicker to be fed with data.
The elementary queuing systems are interrelated. In a queuing model used to analyse an application server, the queue receives service requests from outside and from a feedback loop inside. If the outside arrivals are negative exponentially distributed, then the mix with the feedback requests will change the arrival pattern into something that is not negative exponential anymore. With the tools we have it is not feasible to obtain more insight into the probability distribution of either the arrival process or the service process for the CPUs. In the mBrace approach this aspect is ignored.
One example is calculating total waiting time for CPU for a transaction type. There are at least two options. In both cases the waiting time for the average CPU usage of all service requests is used. The rough option calculates total waiting time of the transaction by multiplying total CPU usage of the transaction by the queuing factor. This underestimates waiting time for transactions with a high number of I/Os. Average response time for all transactions however is accurate.
The more accurate alternative multiplies number of interrupts (equals approximately number of I/Os for the transaction) with elementary waiting time for CPU usage.
2.3.2 Accuracy of the input
The accuracy of the model depends on the accuracy of the input fed into it. As the familiar saying goes, garbage in, garbage out. The accuracy is determined a great deal by the data we need to select the correct curve, but also by the accuracy of the utilisation. We use the following data in the model to determine the type of queue:
Dispersion of interarrival and service times, number of servers, max number in queue, number of sources and queuing discipline.
To determine the queuing factor, we need the server utilisation, the average arrival rate, average service time and number of servers.
2.3.3 Interarrival and service times
The dispersions of the interarrival times and the service times influence the waiting times.
The statistical measure used for expressing the dispersion of a value is the squared coefficient of variation (CV2). Waiting times tend to increase when the dispersions of interarrival and service times, either one of them or both, increase.
Information about the probability distribution including the dispersion of inter arrival times is difficult to determine, especially when dealing with new systems that are yet to be commissioned. We tend to assume that CV2a =1 (the exponential case) when we do not have other information. Let us see how this may work out from the following example. Table 1 shows a sample of CV2as of arrivals from a real life system. The table shows measured values of transaction volume, the averages and coefficients of variation of the interarrival times for periods of one hour.
Table 1. Example of CV2s in a real life system.
A performance analyst would normally focus on the period of 09:00 - 10:00 because it holds peak volume and as ever, presume that CV2a =1.0 for the interarrival times. In this particular case, as a coincidence, the value of CV neatly approximates 1.
Between 08:00 and 17:00 the values of the dispersion with CV2a =1.2 nicely approximates the negative exponential probability distribution, but there are also various periods e.g. 17:00 - 18:00 with CV2a =4.6 that would cause substantial errors if CV2a =1.0 were presumed.
Figure 8 shows the influence of CV2a on waiting time. From this, the effects of possible errors in the waiting time curves for one elementary queue are plain to see. The curves show the waiting time for one constant value CV2s = 1 for service times and values 1 through 5 of the CV2as for interarrival times. The utilisation is limited to the range between 40% and 80%. The figure reveals that the errors for CV2a = 1.2 instead of 1.0 are negligible, but CV2a = 5.0 instead of 1.0 are substantial. It is not only the dispersion of the interarrival times, but also of the average value of the service times that influences the waiting times. One curve for CV2a = 5.0 and CV2s = 5.0 is added to illustrate this.
Figure 8. Sensitivity of waiting times for changing dispersion of interarrival times.
If we take CV2a = 1, then at a 0.60 utilisation the average waiting time E(tw) = 0.18, while for CV2a = 1.2 the real value is: 0.21 and for CV2a = 5 the real value is: 0.70. At CV2a = 5, queuing times would increase almost 4 times. So in reality, waiting times would be 4 times as high as predicted. Let us see how such an error would affect the outcomes of the overall model if the error would occur on all queues.
Table 2. The effects of an error in all queuing calculations of the overall model.
Table 2 compares the response times calculated by our example model demonstrated in the first part, at full load and optimised capacities. The table reveals that the errors for
CV2a = 1.2 instead of 1.0 are negligible, but for CV2a=5, instead of 2.3 we would predict a 95p for response time of 4.6, which is a substantial error.
We have to keep in mind that errors can be introduced in both the dispersion of the arrival process and the service process. Before cutover, both are not known yet. The above analysis shows that the model is sensitive to errors in the values of dispersion of interarrival times and the times for processing service requests. For many a queue, the CV2s are unknown and in that case CV2=1 is taken as an assumption. This may introduce a considerable error.
2.3.4 Number of servers
Figure 9 provides some insight into the sensitivity of queuing times for the number of servers. A set of curves are drawn. For each curve, only the number of servers changes. The legend shows the curves classified according to Kendall. The various curves tell us that if we have work done, the waiting times become lower on average, if we do the work with more servers, even if the total capacity of the servers and the load are the same in both cases. Curves for the lower numbers of servers are typically used for symmetrical multiprocessing. The number of servers corresponds with the number of processors. We model the disks of a storage system or a pool of memory blocks that are dynamically used by the transactions as a multiple server queuing system too.
Rho (ρ) is the utilisation of the servers (the resources). At the Y-axis we can read the queuing factor as a function of Rho. So waiting times are very sensitive to the number of servers in a resource. However in most cases it is not difficult to determine how many servers we can avail of, but there are complicated examples like multiple CPUs with affinity. In order to preserve efficient usage of the cache, threads are assigned to one preferred processor. On the other hand if the preferred processor is in use by another process for too long of a time, then another processor may be assigned to the waiting thread. As a consequence threads do not have unlimited access to all processors. When mapping such a multiprocessor system to a queuing model, it is not easy to determine the number of (queuing-)servers. Still though the waiting times calculation may be sensitive to this, in most cases it is not difficult to determine the number of servers and the chance that we introduce a considerable error by taking the wrong number of servers is small.
Figure 9. Waiting time curves for various numbers of servers.
2.3.5 Number of sources
Figure 10 shows the sensitivity to the number of sources (or customers, or service requesting elements). There are curves for 100, 250, 1,000, 6,500 and ? sources. If the same amount of work is generated by various numbers of sources, the waiting times increase with the number of sources. In the mBrace approach, most of the modelling is done using the curves with infinite sources. We can see that the waiting times are not very sensitive for the number of sources as long as Rho (the utilisation) stays below 0.9. Even if we do not determine the number of sources correctly, which often occurs, the error is not important. Even using a value of ∞ is a practical solution that allows us to use simpler tools for calculation without much loss of accuracy.
Figure 10. Sensitivity for the number of sources
2.4 Scaling resources
Resources can be scaled horizontally and vertically. For the model, it is absolutely essential to have this capability too. Symmetric multiprocessing is a wide spread feature of servers. The model allows for calculating the changes in response times due to changing numbers of CPUs (horizontal scaling) or changing computing power of these CPUs (vertical scaling). The same goes for storage, memory (only horizontal scaling) and networks. Storage scales vertically by changing the access times of disks and horizontally by changing the numbers of disks.
Scaling is an essential capability of the model that not only allows us to study response times at changing capacities, it also allows us to collect measurement data for our studies on test environments that greatly differ from the eventual production environment. This not only gives us more room to conduct such studies, but also makes the test environments we need much cheaper.
2.5 Challenges for queuing models
There are aspects that are difficult to handle in modelling such as: the dispersions of arrivals and service times, paging, hit rates on caches, processor affinity, head of line queues etc. However in practice this turns out to be less of a problem than it seems at first sight. For the dispersions we take CV = 1, paging is ignored, the hit rates on the caches are set to 95% by default, processor affinity is ignored and head of line queues are approximated with models for other queuing disciplines.
This paper covers the modelling applied in the mBrace approach. The model used departs from the notion of a response time breakdown. The response time breakdown describing response times by the times transactions spend on the resources of the infrastructure and waiting times for these resources. The resource times are gained by measurements; the waiting times are calculated applying queuing theory. Attention was paid to the accuracy of the queuing models used and the conditions for applying these models. Simple elementary queuing models are applied. It was also shown how the response time breakdowns are calculated for each transaction type. Though the queuing calculations play a crucial role, most of the model is dedicated to calculating many simple rules.
4 Next part of the paper
The third and last part of this paper will be published in one of the next issues of MeasureIT. This part will treat the subjects Measuring, Validating and some examples of studies done with the mBrace approach.