mBrace - Part III
A pragmatic model driven approach for Performance Engineering of transaction processing systems
by Michael Kok
|About the Author|
A careful balanced combination of measuring and modelling
The simulation model that describes response times and resource utilisations has a central place in the mBrace approach. Part I of this paper about the mBrace approach was published in the November issue of MeasureIT and shed light on the way the model assists in securing system performance. The example in Part I demonstrated a model that was filled with measured data. Creating such a model is not simply a matter of modelling or measuring; instead it is about the careful balanced combination of measuring and modelling. What and how we measure is determined entirely by the model and modelling is greatly influenced by the possibilities and limitations of measuring.
The main subject of Part II, published in the April 2006 issue of Measure IT, was modelling. This included items like: the way the infrastructure maps to the response time breakdown in the model; the way queuing theory is applied to calculate waiting times after breaking down the system into elementary queues; Kendalls classification for determining the type of queue; the accuracy of the model; horizontal and vertical scaling of resources.
Figure 1. Treated in Part II. Applying queuing theory: determining waiting time
The following sections of Part III on the mBrace approach provide more insight into:
An overview of the structured way measurement and other data are collected to fill the model.
The model is validated in two steps in order to secure reliable outcomes.
Finally three examples of results obtained with the mBrace approach over a period of fifteen years showing the increased complexity of information systems reflected in performance studies.
2 Data collection
The simulation model plays a central role in the mBrace approach. Data collection is done with the main purpose to fill the model and to enable validation of it. Part two on modelling skimmed over much of the information involved in a performance study with the mBrace approach. Data collection not only involves measuring, but also the collection of information from various sources. This information includes:
- Norms for response times
- The business process the system is used for
- The structure of the application transaction types, e.g. the use cases
- Base volume and system transaction volume
- Infrastructure characteristics, such as capacities
- Application characteristics, such as transaction profiles
Unfortunately, all these items are not conveniently provided in a structured way to the performance analyst. Various persons with different expertises have to contribute. A performance analysis really is a multidisciplinary endeavour.
Determining what measurement data are needed
One of the challenges of performance engineering is to determine what measurement data are needed. The next and main challenge is to acquire them with acceptable quality and at affordable cost. A model-based approach greatly simplifies the process of choosing the measurement data. One only needs the data that are input to the model. First, we need the arrival rates and the service demands, i.e. the times the transactions spend at the resources. The necessary data also includes CPU times, numbers of transfers to and from storage and amounts of memory used per transaction. In addition, we would like to know more about the probability distributions of these data, in particular their coefficients of variation.
A number of measurements have to be taken concurrently on a number of platforms using various measurement functions. Though this involves trivial tasks, having these measurements done in concert by all involved specialists is not a trivial endeavour at all. Practice has taught with a rough hand that this requires strong coordination and working with a measurement plan is a must. Various disciplines may be responsible for delivering their part of the measurement data and the measurement plan is the vehicle for communication and control.
A measurement plan should at least cover:
- The processes for capturing and delivering the measurement data.
- The main components of the infrastructure, such as servers, LANs and WAN. For each server, it includes at a minimum, the usage of CPU, storage access (not disk space!) and memory.
The mBrace measurement plan consists of single user measurements and multiple user measurements.
Single user measurements and transaction profiles
Single user measurements yield the transaction profiles that we need as input for the model. A transaction profile consists of the service demands for all critical resources used by a transaction type. This includes such things as the number of incoming and outgoing packets and for each server in the chain: the CPU usage, number of disk transfers and memory usage. For a CICS application, these items can be obtained easily with widely available mainframe tools. However, on UNIX and Windows platforms this is less simple. Standard measurement facilities on these platforms do not produce these data at the transaction level. How can we determine, for example, how much CPU a specific transaction consumes? If we measure CPU while processing multiple transactions we obtain all CPU usage for a certain period of time. It is still not clear how much CPU each of the transactions consumed individually. In the mBrace approach, resource consumptions per transaction are captured in the single user measurement session. Transactions are processed one by one in a test environment. No other processing is allowed on that test environment at the same time. In this way we also obtain all resource usage in the test system. However since only one transaction has been processed, the measured resource usage pertains to that single transaction. So far this sounds straightforward and simple, however in practice there are many complications. At first sight, one would assume that the measurements show no utilisations when no transactions are processed and that activity can be clearly shown from when single transactions are processed. Reality appears to be different as pointed out next.
There may be a considerable amount of utilisation while no transactions are processed, i.e the nil load. The Java Runtime Environment is an example of a platform that may consume considerable CPU while no transactions are processed. Only little or no extra usage may be measured when one transaction is processed.
Measuring synchronous and asynchronous resource usage
When a transaction is processed, measurements may not only include the processing of that transaction, but also from various other activities. Measured CPU usage may be a multiple of what the transaction really consumes. The synchronous usage is the usage that is caused directly by the transactions. The asynchronous usage is caused by various other activities on the components of the infrastructure and is indirectly related to the transactions. On the application and database servers this may typically be caused by various housekeeping activities such as garbage collection, paging, heart beat checks, replication, taking checkpoints, flushing of buffers and caches etc. On the networks, asynchronous usage may be caused by network management traffic. All these "asynchronous" activities can cause an extra load that is either constant or proportional to the transaction volume or even a bit of both. To get a grip on these matters, we have to use the measurement data taken at higher transaction volumes combined with knowledge of the system.
Accuracy of CPU measurement
Another obstacle has to do with the accuracy of CPU usage measurement, which seems to be invariant while processors become faster and faster. A transaction may use 2 milliseconds of CPU on a certain server. After upgrading the server a couple of years later, the same transaction type may consume 0.4 milliseconds on CPUs that are 5 times as fast. The resolution of the measurement however remains at 1 millisecond. So the measurement reports 0 milliseconds of CPU usage. Measuring a set of 3 transactions fired at the same time may solve this, but also introduces new complexity.
Multi-user measurements are necessary to determine how asynchronous usage of resources develops along with transaction volumes and also for validation purposes. Validation is treated in the next chapter. Multi-user measurements are conducted for three to five levels of transaction volume. From this, the development of asynchronous resource usage can be determined.
Normally measurement overhead i.e. the resource consumption by the measurement processes is negligible. However, this may not be the case when taking measurement data at short intervals, as we may do with single user measuring. Measurement overhead must be determined and deducted from the measurement data.
Exclusive use of the test environment
In practice, it appears to be difficult to have exclusive use of a test environment for single user measurements. Other testers from a project may feel a lot of pressure to continue their work and an appointment made for the use of the test environment may easily be forgotten. With respect to this, it is amazing how limited the possible technical means are to enforce exclusive use of the test environment. Apart from temporarily removing all users out of the user management of all systems, it seems quite difficult to block usage by unwanted users.
Generating test load
The transactions can be fired using a load generator like J Meter, QALoad, LoadRunner etc. This requires scripting, which takes time, however this approach delivers accurate response time measurements. As an alternative, the transactions may be fired by hand. This saves considerable time for preparations, but the hand clocked response times are somewhat less accurate. The multi-user tests can also be done manually by a group of testers, since each tester can easily fire ten times as many transactions as a user commonly does in production. In most cases this is an adequate alternative.
Many organisations do not allow for installing measurement software on their systems because of security considerations. Therefore agent-less measuring is an important capability. This involves taking standard measurement functions from the operating systems and application middleware at short intervals while the transactions are processed at the same time. The resulting measurement data are related to the transaction types processed using special parsing software.
The following measurement tools are commonly used:
RMF and TMON at z-Series, SAR, NMON etc. on UNIX machines, Perfmon on Windows platforms. Sniffers are used for measurements on networks.
2.2 Collecting other data
Apart from measurement data, various other data must be collected such as:
- The structure of the application and its place in the overall architecture.
- Which transactions have interfaces to other applications and what applications?
- The business transactions or use cases that are frequently used.
- The system transactions that are frequently used.
- The types of transactions: e.g. some applications have a software robot that may fire a series of cascaded transactions etc.
Validation is done in two steps:
- Checking time spent at resources against single user response times.
- Checking calculated resource utilisations and response times against measured resource utilisations and response times at higher transaction volumes.
Validation 1: time spent at resources against single user response times
In the mBrace approach the transactions are measured in the sequence of the business process. The service demands of each transaction on all resources and the single user response time are measured simultaneously in order to produce the transaction profiles. This is done while no other work is done on the test environment. In this way, we can be sure that our measurements only cover the work done by the transactions fired. However, the operating system still has several processes running apart from those immediately involved in processing the transactions that consume resources. While we are measuring transaction processing, other activities may take resources at the same time, such as taking a database checkpoint, garbage collection etc. If one of these activities interferes with the transaction processing we are measuring, we can have a significant error in our analysis. To be able to correct such errors, measuring the service demands of the transactions is repeated in six iterations. Experience shows that this is sufficient to secure reliable results in an efficient way. Apparently coincidental interference occurs infrequently.
In the first validation step we compare the six measurements of each transaction against each other assuming that those measurements deviating from the majority pattern have been polluted by incidental activities of the operating system.
The next figure illustrates this for an example with two use cases. The figure shows seven graphs. The left most graph shows the average values for measured and calculated response times. The other graphs show the measured and calculated response times for each individual iteration. For each transaction there is a coloured bar in each of the seven graphs. The bars have blue and yellow, or blue and red. The red and yellow parts reflect the residuals.
A residual is the difference between measured and calculated single user response time. In other words, it is the unexplained part of measured single user response time. The measurements of a transaction show a positive residual when their calculated response time is smaller than the measured one. In the graph, the positive residual has a yellow colour. A red part designates a negative residual.
Measurements that differ too much from the average pattern are eliminated. The measurement data that are left yield the average values for the service demands in the transaction profiles. The second graph shows two transaction types that deviate too much from the pattern of the other five, numbers 6 and 14. The crossing bold red lines mark them. Transaction type 14 has been discarded already. Transaction type 6 is still there but will be discarded next. So five out of the six measurements of these transactions are retained.
The remaining measurements are secured now for reliable response time breakdowns.
Validation 2: comparing resource utilisations and response times at higher transaction volumes
The calculation of waiting times is sensitive to errors from the utilisations, especially at higher transaction volumes. A small error in a service demand can cause a significant error in the resource utilisations at the target transaction volume and this can cause a large error in the waiting time calculated for that resource.
Total resource utilisation consists of both synchronous and asynchronous resource usage. For the transaction profiles, synchronous and asynchronous resource usages are split. The resource utilisations reflecting total resource usage at certain transaction volumes are measured while doing a multi-user test. With the model, we calculate resource utilisations at the same volumes and compare them to the measured utilisations. Differences may have several causes, such as errors from the measurement process, incidental interference from various system functions, testers at work where they should not, errors in the parsing, etc. On one occasion we found out that batch processing running every 10 minutes interfered with the measurements. No one involved in the application development was aware of it and only one person could explain this phenomenon. Furthermore, it is hard to predict the growth of asynchronous usage with the transaction volume on modern platforms.
The validation is conducted in a test environment while there is no other activity in the environment (thus no noise) so the utilisations from the environment are all kept zero in the model.
When we take measurements for a couple of values of the transaction volume we can analyse how asynchronous usage develops with the transaction volume. The resource usages of the transactions are corrected to obtain a fit. At the same time, the measured response times are compared with the response times from the model.
Figure 3 compares the resource utilisations before (a) and after (b) correction. The green rods marked with Application Xv (v stands for validated) in the legend describe the resource utilisations measured in the load test for the validation. The blue rods (Application Xm, m stands for measured) reflect the utilisations that are calculated with the model for the same transaction volume.
In the last validation step, the measured and calculated response times are also compared. If there are significant differences, additional investigation is necessary to sort out the problem.
The next figure shows the mBrace DNA-profiles of an application in a comparison of the response time breakdowns before (a.) and after correction (b.). As you can see from the upper response time breakdowns in figure 4, the maximum response time increased from 2.2 seconds to 3.0 seconds.
Figure 4. Response times and utilisations before and after correction from the validation at the target transaction volume compared.
After the two validation steps, the model is firmly secured for reliability and it is ready for use.
The mBrace approach developed over quite a period of time. Though the dashboard in its current format was constructed only 5 years ago, measurement data and results from earlier performance studies are still available. DNA-profiles of the applications investigated much earlier have been constructed to show some examples through time. It is fun and educational to see how performance of systems has evolved and how applications from 15 years ago, would perform on todays infrastructures.
4.1 Example 1: 1990, CICS - DB2
The system was a CICS-DB2 application for a government agency with 200 users. The infrastructure was based on an IBM 3090-500S connected to the user location via a wide area network with a trunk connection at 28,800 bits/sec. Purpose of the study was to secure the successful roll out of the system. This was the second study based on an early version of the mBrace approach. One month prior to cutover, the analysis showed that the roll out could be secured by increasing the capacity of the trunk line by installing a second one. The response times were estimated at an average of 3.1 seconds and a 90-percentile value of 4.7 seconds. One year later, when some of the users (120) were busy with the initial input of data, a sample measurement of response times showed an average of 2.6 seconds and a 90p value of 5.0 seconds.
The above picture shows the dashboard selection of DNA-profiles for three situations:
- The expected response times and resource utilisations in the original state.
- The response times and resource utilisations after the trunk line was upgraded. Response times are 1 second lower and the utilisation of the trunk line (PR1) drops from 95% to 40%.
- The response times of the same system projected on an infrastructure typical for 2005. Response times are all lower than 0.7 seconds. Utilisations have become negligible.
4.2 Example 2: 1993, ERP system on UNIX
The system had an application package that supported the main business functions of the company. The ultimate infrastructure was based on a UNIX server with four Intel 480 50 MHz CPU. 14,400 Bps trunk lines with multiplexers connected 200 users dispersed over 20 locations. The study was done after a project crisis about poor response times eight months prior to cutover and was repeated again for confirmation one month before cutover. The purpose of the study was to solve the problem and secure system performance after cutover. The investigation showed that there was a lack of CPU-capacity. The server was replaced, increasing CPU-capacity by a factor 4. The original server had four loosely coupled CPUs. The new one had 4 tightly coupled, 4 times as fast CPUs. The norm set by the customer was: "average response time lower than 3 seconds". Hardware was chosen on the basis of the study such that the average response time was 3 seconds. After cutover, sample response time measurements were done with stopwatches showing an average of 3 seconds. Responsiveness was accepted.
The pictures show the DNA-profiles of the dashboard selection for three situations:
a) The original state reconstructed with the dashboard. Lack of CPU capacity. Outcome of the model (not verified): average response time: 4.7 seconds, 95p: 14 seconds.
b) Estimated at cutover. Some response times were still relatively high, but average response time was estimated at 2.3 seconds, 95p: 6.4 seconds. After cutover the CPU-utilisation turned out to be 20% higher. Average response time was 3 seconds.
c) Response times and resource utilisations estimated for the same system on an infrastructure typical for 2005. Because of the much faster infrastructure, not only do response times reduce drastically, but also resource utilisations. Because of the shorter response times, dynamically used memory could even be reduced. This would increase memory utilisation again. Average response time: 0.1 seconds, 95p: 0.4 seconds.
The system is still operational, though it was thoroughly changed. Nowadays PCs with a graphical user interface replace the original VT100 terminals. If we would analyse the system again, as it is today the figures would definitely show much more service demand on the resources and higher response times than the third picture shows.
4.3 Example 3: 2006
Example 3 is derived from an advanced highly complex web based financial system. The legend of the response time breakdown graph reveals that the transactions pass an infrastructure chain of at least 5 servers. All mainstream operating systems are represented. Not all transaction types comply with the norm of 2 seconds. Amazingly still the system is completely accepted by, even popular with its users due to the added value of its strong functionality.
4.4 Conclusion for examples
When compared, the examples show a number of interesting aspects:
Obviously, systems have grown considerably more complex as can be seen from the legends of the response time breakdown graphs.
Resource consumption per transaction has increased a great deal apparently because a lot more work is done in these transactions. The early systems demanded quite some attention to make them perform; the contemporary systems demand even more attention. It is amazing how well the old systems would perform on contemporary hardware.
Though hardware has become considerably faster, the challenges to make systems perform seem to remain all the same. Clearly, there is always a race going on between hardware and software development and probably always will.
This third and concluding part of the paper covered model data collection and validation of the model. Data collection includes gathering information about requirements and service levels, business volumes and the mapping of system transactions on business processes. Most of the efforts for data collection are made for producing measurement data. Measurements are rather complicated; even more when agent-less measuring is required.
There are two validation steps: single user and multiple users. At single user, there is negligible utilisation consequently there is no queuing, thus the measured single user response time does not include any waiting and is described by summing the resource usage time for all resources in the chain.
The mBrace approach
The approach allows us to obtain a complete overview of a systems performance in relatively short time and ensure a successful application rollout. The simulation model calculates response times for varying transaction volumes and changing capacities of the infrastructure. Applying queuing theory in a simple and rough way enables us to gain insight into how response times develop at changing transaction volumes. This allows us to model system performance and capacity on extensive and complicated infrastructure chains.
The way measurement data are collected is adjusted for the model. The interrelationship between measuring and modelling is carefully managed. Creating and implementing a measurement plan is an important part of the approach. The model filled with the measurement data is extensively validated to obtain reliable results.
The mBrace approach includes a method, techniques and tools and has the following features:
- With its application advice showing the breakdown of all transaction types of the application and its capacity advice showing the required scales of all infrastructure resources, the mBrace approach provides all information needed to secure the performance of a newly developed system sufficiently prior to cutover.
- Applicable from the moment the application is available in a test environment.
- Model driven
- Capable to handle any infrastructure configuration
- Capable of handling any application software configuration
- Makes use of ubiquitous measurement functions, such as RMF, SAR and Perfmon.
- Parsing and modelling tools are tailored for the system analysed
The mBrace approach was used and developed over a 15-year period. The most amazing aspect of that period is the growth in the degree of system complexity.
Special thanks to: Rob van der Wouw of Eureka Unlimited, my highly valued partner in the most recent mBrace performance analyses, who provided crucial contributions by developing and applying parsing and agent-less measurement techniques, and my son Tjerk Kok, who helped developing measurement techniques for the mBrace approach in its early stages.
I feel much obliged to Dr. Ruben de Leeuwe who showed me the first modelling examples quite a while ago and Dr. Michiel van Hoorn who gave me various helpful hints in the field of queuing theory.
Various friends took the effort to work through this paper and provided feed back: Barry Sokolik editor for MeasureIT, Monique Krinkels and Hajo Strik of Bluebird Publishers, Barry Rozemeijer, Andrés van Staveren, Ronald Boon, Koos de Mooij, Ben Winnemuller of ING Bank, Dr. Ruud de Boer, Ronald Schut of Contentional IT Performance Architects. Thanks a lot!