CMG Home

Site Map Links Members Only National CMG Groups Measure IT International Conference

MeasureIT
 In This Issue
 
From the Editors

Articles >

Forecast Generation

I/O Virtualization

Measurement for Maturity (Part 2)

Capacity Utilisation

CMG News >

'07 Program Update

Press Release (05/31/2007)

Press Release (06/18/2007)

Region News >

Philadelphia

New York

Events >

Calendar

 Article Database
 Resources
 Industry Articles
 Submit Article
 SubscribeIT
 RemoveIT
 Letter to Editor
 About MeasureIT
 Contact Us
 
MeasureIT

Guerrilla Capacity Planning
PART I: Hit-and-Run Tactics for Website Scalability
April 1, 2003
by Neil J. Gunther

About the Author
Neil J. Gunther, Performance Dynamics ConsultingSM

Neil Gunther, M.Sc., Ph.D. is an internationally recognized consultant who founded Performance Dynamics Company (www.perfdynamics.com) in 1994. Prior to that, Dr. Gunther applied his training in theoretical physics to research and management positions at San Jose State University, JPL/NASA (Voyager and Galileo missions), Xerox PARC and Pyramid/Siemens Technology. His computer performance analysis and capacity planning classes have been given at both corporate and academic institutions including AOL, Boeing, FedEx, Motorola, Stanford University, Sun Microsystems, SAGE-Australia and Thales Group (Holland).

Dr. Gunther is the author of numerous papers on computer performance, as well as three books: THE PRACTICAL PERFORMANCE ANALYST, McGraw-Hill (1998), ANALYZING COMPUTER SYSTEM PERFORMANCE WITH PERL::PDQ, Springer-Verlag (2005), and GUERRILLA CAPACITY PLANNING, Springer-Verlag (2006). He is well-known to CMG (Computer Measurement Group) audiences for his presentations since 1993, and his very popular articles in the CMG MeasureIT online magazine. In 1996 Dr. Gunther was awarded Best Technical Paper at CMG and in 1997 he was nominated for the A.A. Michelson Award.

Performance Dynamics has recently embarked on joint research into QIT (Quantum Information Technology) and Dr. Gunther has developed a theory of "qubit bifurcation", which is being tested experimentally. Since Dr. Gunther has Dirac number 2 (his M.Sc. supervisor was Prof. C. J. Eliezer; one of Dirac's few research students) and his Ph.D. was awarded in the UK for studies in quantum field theory and phase transition phenomena, he is well-equipped to explore the new frontier between classical IT and QIT.

Dr. Gunther was born in Melbourne, Australia and is a member of the AMS, APS, ACM, CMG, IEEE, and INFORMS.

[Hide]

Click here to read part 2 of this article

1  Introduction

We, so-called, performance experts have a tendency to regurgitate certain performance cliches to each other, and to anyone else who will listen. Cliches like:

  1. Acme Corporation just lost a $40 million sale because their new application cannot meet service level targets under heavy load. How much money do they need to lose before they do capacity planning?

  2. Company XYZ spent a million dollars buying performance management tools but they won't spend $10 thousand on training to learn the capacity planning functionality. They just produce endless strip charts without regard for what that data might imply about their future.

Several years ago I stopped mindlessly reiterating statements like these and took a hard look at what was happening around me. It was then that I realized not only were people not gravitating towards capacity planning, they actually seemed to be avoiding it at any cost! From this standpoint, we performance experts appeared more like clergy preaching from the pulpit after the congregation had well and truly vacated the church.

In trying to come to grips with this new awareness, I discovered some unusual reasons why capacity planning was being avoided. Later, I began to ponder what might be done about it and presented some of those ideas at the 1997 CMG Conference [Gunther 1997].

My thinking has evolved over the past several years [Gunther 2002] and I would like to share my current perspective with you in this article. Since I see performance management differently from most, you may find my conclusions rather surprising and perhaps, inspiring.

2  Doing More with Less

Traditional capacity planning has long been accepted as a necessary evil for mainframe [Samson 1997] and data network procurement [Cockcroft and Walker 2001]. The motivation in the past was simple; the hardware components were expensive and budgets were limited. Therefore, the expenditure of those dollars required careful and time-consuming analysis.

Nowadays, however, hardware has become relatively cheap-even mainframe hardware! The urge to launch an application with over-engineered hardware has to be tempered with the less obvious caution that bottlenecks are more likely to arise in the application design than in the hardware configuration. Simply throwing more hardware at performance problems will not necessarily improve performance. So, some kind of analysis and planning may still be required even if you have all the hardware in the world.

To make matters worse, we now live in the brave new world of distributed component-based computing and web-based architectures where we have many software pieces in many hardware places. In stark contrast to the traditional style of capacity planning for monolithic mainframes, we have huge number of incompatible variants to contend with:

  • Little or no instrumentation in third-party or in-house applications.
  • No such thing as UNIX! There's: AIX, HPUX, Solaris, BSDI, FreeBSD, RH Linux, Debian Linux, MacOS X, ...
  • There's not even one kind of Windows operating system anymore.
  • Scripts built on one UNIX variant almost invariably do not to work on another.
  • Multiple COTS (Common Off-The-Self) applications running on multiple vendor platforms.
  • Component-based software: Java, .NET, CORBA, ODBC, enterprise beans, etc.
  • No common performance metrics like RMF (Resource Measurement Facility) or SMF (System Management Facility) available on MVS mainframes.
  • Most commercial tools have mainframe roots and thus tend to be server-centric in their data collection capabilities. Additional tools are needed for network and application data.
  • There's no convenient way to comprehend resource consumption across multiple tiers

This makes analysis and planning of web sites far more difficult than it needs to be.

In an attempt to ameliorate some of these challenges, CMG has stood behind the development of performance measurement and management standards like UMA (Universal Measurement Architecture) and ARM (Application Response Measurement) now owned by the OpenGroup. Unfortunately, the vast panoply of tools and platform vendors have not been convinced that there is revenue in these standards so they have not really caught on in the industry 2.

In summary then, we are building more complex architectures with less instrumentation available to manage them. This is a very risky approach which seems to be sanctioned in software engineering [Smith and Williams 2002] in a way that would not be acceptable in most other engineering disciplines. I don't know about you, but I'm glad Boeing doesn't build aircraft in such a risky way! Since this is such a source of frustration for fostering performance analysis and capacity planning, let's try to understand why high risk is acceptable in the context of software engineering.

2.1  As Long as It Fails on Time!

Some managers believe they don't need to bother with capacity planning. How many times have you heard this response? But, I believe, this response is based on a wrong perception of risk. Assessment of risk is often subverted by a false perception of risk: Someone else will loses $40 million because of poor performance, not me. See sidebar 2.1.1 on Risk Management vs. Risk Perception for an explanation of why this inverted logic runs so deep.

Management is generally employed to control schedules. To emphasize this fact to my students [Gunther 2001], I tell them that managers will even let a project fail-as long as it fails on time! Many of my students are managers and none of them has disagreed with me yet. What this means is that managers are often suspicious that capacity planning will interfere with project planning. Under such scheduling pressures, the focus is on functionality first. Unfortunately, new functionality is often over prescribed because it is seen as a competitive differentiator. All the development time therefore tends to be absorbed by implementing and debugging the new functionality. In this climate, applications often fail [Ackerman 2002] to meet performance expectations [Smith and Williams 2002] as a result of management pressure to get the new functionality to market as fast as possible.

2.1.1  Risk Management vs. Risk Perception

Consider the poor fellow driving to the airport with white knuckles because he just saw a news report on CNN about a plane crash and now he's fretting over the safety of his own flight. What's wrong with this picture?

Statistics tell us that he has a greater risk of being killed on the freeways than the airways (by a factor of 30 or more). Our traveller has also heard these same statistics on television. So, why doesn't he remind himself of this important fact and look forward to his flight, in spite of there being an air disaster that day? Try it some time. It doesn't work. It's a psychological issue, not one of rational thought. On the freeway, our intrepid driver feels like he is in control because he has his hands firmly on the steering wheel. But on the aircraft, he is just another fearful passenger strapped into his seat. This fear is registered at a deep personal level of (false) insecurity. He remains oblivious to the possibility that he could have been completely obliterated by another careless driver on the freeway.

And that is the essential difference between risk perception and risk management. Managers are paid to be in control. Therefore, bad things will not happen to the project they are managing because that imply they are not really in control. Incidentally, our nervous traveller's best strategy is actually to fly to the airport!

Let's face it, Wall Street 3 still rules our culture. Time-to-market dictates the schedules that managers must follow. This is a fact of life in the new millennium and a performance analyst or capacity planner who ignores that fact puts his or her career in peril. So, not only are we supposed to do more with less, we're supposed to do it in less time! In view of these seemingly insane constraints, it is imperative that any capacity planning methodology not inflate project schedules.

2.2  The Performance Homonculus

Performance management can be thought of as a subset of systems management activities. Systems management includes activities like:

  • Backup/recovery
  • Chargeback
  • Security
  • Distribution of software
  • Performance management

Looked at in this way, performance management is simply another bullet item. But this is another of those risk mis-perceptions. In terms of complexity, it requires the most significant skill levels. It's rather like the difference in medicine between the torso and the homonculus.

Indicating the location of an ailment to your doctor has meaning because your body (torso) is referred to in geometric proportion. The homonculus, on the other hand, represents the sensate proportion of our bodies. Reflecting this sensory weight (see Figure 1), the hands and the mouth become huge whereas the thorax and head appear relatively small. This is because we receive vastly more sensory information through our fingers and tongue than we do via the skin on our chest, for example.

homonc.gif
Figure 1: homonculus
 

The same proportionality argument can be applied to performance management skills.

Performance management skills are to the homonculus as systems management skills are to the torso.

Almost every other item in the list above can be accommodated by purchasing the appropriate COTS package and installing it. Not so for performance management.

In terms of coverage, performance management can be broken into three major areas:

  1. Performance monitoring
  2. Performance analysis
  3. Performance planning

Most attention is usually paid to level 1: performance monitoring because it is generally easiest to address. If you want to manage performance and capacity, you have to measure it. Naturally, this is the activity that the majority of commercial tool vendors target. As a manager, if you spend $250,000 on tools, you feel like you must have accomplished something. Alternatively, UNIX and NT system administrators are very good at writing scripts to collect all sorts of data as part of their system administration duties. Since almost nobody sports the rank of Performance Analyst or Capacity Planner on their business card these days, that job often falls to the system administrator as part of the systems management role. But data collection just generates data. The next level (2) is analysis. The usual motivation for doing any analysis these days is to fire-fight an unforeseen performance problem that is impacting a release schedule or deployed functionality. With a little more investment in planning (level 3), those unforeseen ``fires'' can minimized. But, level 3 is usually skipped for fear of inflating project schedules. How can this Gordian knot be cut?

3  Guerrilla Capacity Planning

In my view, a more opportunistic approach [Tabor 1970] to capacity planning is needed. Enter guerrilla capacity planning!

kong.gif
Figure 2: G-u-e-r-r-i-l-l-a, not gorilla.

The notion of tactical planning may seem self-contradictory. At the risk of mixing metaphors, we can think of traditional capacity planning as being the 800 pound gorilla! That gorilla needs to go on a diet to produce a leaner approach to capacity planning that is compatible with the modern business environment described in Section 2.1. By lean, I don't mean skinny.

Skinny would be like remaining stuck at level 1 where there is a tendency to simply monitor everything that moves in the false hope that capacity issues will never arise and thus, planning can be avoided altogether. Monitoring requires that someone watch the 'meter needles' wiggle. Inherent in this approach is the notion that no action need be taken unless the meter redlines. But performance 'meters' can only convey the current state of the system. Such a purely reactive approach does not provide any means for forecasting what lies ahead. You can't forecast the weather by listening to leaves rustle.

The irony is that a lot of predictive information is likely contained in the collected monitoring data. But, like panning for gold, some additional processing must be done to reveal the hidden gems about the future. Keeping in mind the economic circumstances outlined earlier, moving to levels 2 and 3 must not act as an inflationary pressure on the manager's schedules. Failure to comprehend this point fully is, in my opinion, one of the major reasons that traditional capacity planning methods have been avoided.

3.1  It's Not a Model Railway

The goal of capacity planning is to be predict ahead of time that which cannot be known or measured now. Prediction requires a consistent framework in which to couch the assumptions. That framework is called a model. The word ``model'', however, is one of the most overloaded terms in the English language. It can mean everything from a model railway set to the model, Cindy Crawford. Consider the model railway. The goal there is to cram in as much detail as the scale will allow. The best model train set is usually judged as the one that includes not just a scale model of the locomotive, and not just a model of an engineer driving the scaled locomotive but, the one that includes the pupil painted on the eyeball of the engineer driving the scaled locomotive!

This is precisely what a capacity planning model is not. For capacity planning, the goal is to discard as much detail as possible while still retaining the essence of the system's performance characteristics. This tends to argue against the construction and use of detailed simulation models, in favor of the use of spreadsheets or even automated forecasting. The skill lies in finding the correct balance. Linear trending models may be too simple in many cases while event-based simulation models may be overkill. To paraphrase Einstein: Keep the model as simple as possible, but no simpler.

3.2  No Compass Required

Traditional capacity planning has required relatively high precision because many thousands of dollars were attached to each significant digit of the calculation. In today's economic climate, however, managers usually just want a sense of direction rather than the actual compass bearing. In this sense, the precision of capacity predictions has become less important than its accuracy. There is little virtue in spending two months debugging and verifying a full-blown simulation if the accuracy of a simple spreadsheet model will suffice.

At a more technical level, there is little support for high-precision measurements in open systems. Take UNIX, for example. It's basically an experiment that escaped from the lab circa 1975 and has been producing mutants ever since. What little performance instrumentation exists, was originally implemented in the UNIX kernel for the benefit of the early developers; not for the grand purpose of capacity planning. Nonetheless, every capacity planning tool in existence today primarily relies on those same kernel counters with little modification. And since the PC revolution of the 1980's, performance management has become ad hoc, at best.

4  Summary

To summarize our theme, so far. Time is money; moreso today than in Benjamin Franklin's time. Web sites are distributed and more complex than in mainframe days. Therefore, the traditional approach to capacity planning can no longer be supported. And it's not really about hardware anymore. The new emphasis is on software scalability and that impacts the way capacity planning should be approached.

The key idea presented here is tactical planning but there are at least ten ways in which guerrilla capacity planning differs from traditional capacity planning.

Item Traditional Guerrilla
Budget Big None
Tools Big Tiny
Time scale Strategic Tactical
Approach Passive Proactive
Title Business card No badge
Schedule Inflationary Deflationary
Scope Routine Opportunistic
Reporting Expected Unexpected
Skill set Narrow Diversified
Focus Hardware Applications

Guerrilla capacity planning tries to facilitate rapid forecasting of capacity requirements based on available performance data in such a way that management schedules are not inflated. In PART II, I'll give some examples of how Guerrilla Capacity Planning can work for you in the context of making scalable web sites.

References

[Ackerman 2002]
Ackerman, E. ''Waging a Battle Against PC Bugs,'' SiliconValley.com Posted Jan. 26, 2002
[Cockcroft and Walker 2001]
Cockcroft, A. and Walker, W. Sun Blueprints: Capacity Planning for Internet Services, Prentice-Hall, 2000.
[Dumke et al. 2001]
Performance Engineering: State of the Art and Current Trends, (Eds.) Dumke, R., Rautenstrauch, C., Schmietendorf, A., Scholz, A.,
Springer Lecture Notes in Computer Science, # 2047. Heidelberg: Springer-Verlag (2001).
[Gunther 1997]
Gunther, N. J., ``Shooting the RAPPIDs: Swift Performance Techniques for Turbulent Times,'' Proc. CMG'97, 602-613.
[Gunther 2001]
Gunther, N. J., Lecture notes for Guerrilla Capacity Planning course.
[Gunther 2002]
Gunther, N. J., ``Hit-and-Run Tactics Enable Guerrilla Capacity Plannings,'' IEEE IT Professional, pp.40-46 Jul-Aug, 2002.
[Samson 1997]
Samson, S. L. MVS Performance Management: OS/390 Edition, McGraw-Hill, 1997. The z/OS edition is available in digital format.
[Smith and Williams 2002]
Smith C. and Williams L., Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software, Addison-Wesley, 2002.
[Tabor 1970]
Tabor, R. The War of the Flea: A Study of Guerrilla Warfare Theory and Practice, Paladin, London, U.K., 1970.


Footnotes:

1 Joint Copyright © 2002-2003 Performance Dynamics Company and IEEE. All Rights Reserved. Permission has been granted to CMG Inc., to publish this version in CMG MeasureIT.

2 Alan Schulman recently commented to me that what is lacking is a champion like Barry Merrill. RMF and SMF didn't just magically appear in MVS either.

3 When Einstein was asked what he thought was the greatest force in the universe, he quipped, "Compound interest!" Today, he might well say ``Wall Street!''

Last Updated 03/26/03


Home | Conference | Groups | National | Members | Links | Site Map

Computer Measurement Group