September, 2007
Reviewed by Mark Friedman
Springer, 2006, 253 pp
ISBN-10: 3-5402-6138-9
Neil Gunther,over the years one of the most active contributors to MeasureIT, has a new book, entitled Guerilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services, published by Springer. The book is a companion to Neil's last book, Analyzing Computer Systems Performance With Perl:PDQ from the same publisher, which first became available in 2005. ("PDQ" is short for "Pretty Damn Quick" - can you say that in a family publication? - which is the name of the modeling software that Neil provides in a free download from his company web site at http://www.perfdynamics.com/.) While the earlier book discusses the mathematical background necessary to wield the modeling package effectively and supplies numerous examples illustrating PDQ in action, the latest book is more about attitude. More specifically, it discusses the approach Gunther recommends for being an effective computer capacity planner in today's fast-paced IT environments. It also features several extensively documented case studies that illustrate the principles in action.
You may already be familiar with the guerilla approach. Neil wrote a two-part article for MeasureIT on the subject back in the spring of 2003 where he first cogently described the dilemma today's capacity planners often face:
Several years ago ... I realized not only were people not gravitating towards capacity planning, they actually seemed to be avoiding it at any cost! From this standpoint, we performance experts appeared more like clergy preaching from the pulpit after the congregation had well and truly vacated the church.
- "Guerrilla Capacity Planning, PART I: Hit-and-Run Tactics for Website Scalability"
MeasureIT, April 2003
Could it be that the "classic" mode for pursuing computer capacity planning that was developed and refined during CMG's formative years is currently outdated and irrelevant? The current book is an extended riff on this topic. It is, by turns, insightful, idiosyncratic, playful, argumentative, provocative, sometimes obscure, but seldom dull - a little like an extended conversation with the author himself. If you are a fan of the author's work, as I am, you will want to add Gunther's latest book to your collection. If you are new to the field - or the man - the new book is certainly the most accessible introduction yet to the practical approach he advocates.
So what is guerilla capacity planning and how is it different from the "classical" practices advocated by its original disciples? Gunther is quite clear that his approach is a tactical one. The metaphor of the guerilla fighter is someone who avoids a pitched battle; instead, he capitalizes on any perceived weaknesses and hits and runs. The epigram that frames Chapter 1 of the new book paraphrases Chairman Mao's, The Little Red Book, "When the enemy advances, we retreat; when the enemy camps, we harass; the enemy tires, we attack; the enemy retreats, we pursue." Gunther wants short engagements with rapid production of modeling results, often using simple spreadsheet-based tools or PDQ. You get the idea. We need to become lean, mean planning machines.
Neil assumes there is little or no budget for additional software tools. (Given that many of the costly tools we purchased in the past are now expensive shelfware, it is no wonder the purse strings are tightly held.) Modeling accuracy is not prized and model validation is hardly mentioned. He would never recommend that you spend weeks and weeks absorbed in a model calibration study, for instance.
There is another way to talk about this approach without getting caught up in the military metaphor Gunther favors. The 20th century philosopher Isaiah Berlin famously classified some of the great thinkers in Western thought as either hedgehogs or foxes, from an epigram popularized by Erasmus, but originally attributed to an old Greek poet, "The fox knows many things, but the hedgehog knows one big thing." In days past, capacity planners could be successful if they were hedgehogs, accumulating prodigious amounts of expertise about a single, monolithic problem domain. This was effective because those were the days when 90% of the budget for enterprise computing was directed at a single vendor's proprietary hardware platform, namely IBM mainframe computers. We didn't call them dinosaurs for nothing. They moved slowly. You could also move relatively slowly and still manage to keep pace with their evolution. And they were expensive, too, which along with the long procurement cycle, magnified the need for accuracy.
Times have sure changed. Today's experienced capacity planning campaigner faces a population of servers that is multiplying and morphing like mammals on this side of the K-T extinction event. Not only do we face many competing hardware (multi-core, blades, grids, NUMA) and software choices (J2EE, .NET, Web services, Webpshere, MQ Series, COM+, AJAX, Weblogic, virtualization, etc.), but also must deal with immature and incomplete instrumentation, most of it poorly understood and ragged at best. You need to get in quick and generate answers fast to novel problems, with many uncertainties and irregularities. To survive and prosper, you need access to a rich set of analytical tools and the skill to apply them creatively. You need to be a fox. And the slyest fox prowling the ravaged landscape that used to be known as computer capacity planning may be Neil Gunther.
Chapters 1 and 3 set the book's agenda. Instead of simply moaning about the lack of management commitment to application capacity forecasting and modeling, Gunther drills into why it is that the typical IT organization resists incorporating performance modeling into the application development life cycle. This discussion of why IT organizations don't utilize formal capacity planning methods to procure hardware or incorporate performance engineering techniques of estimation and modeling into the application development life cycle is a crafty piece of analysis.
Gunther observes that computer capacity planning arose as an adjunct to the long procurement cycles associated with expensive mainframe hardware. It was associated with complex (and expensive) software tools that were used to extract large quantities of mainframe performance data and massage it into suitable form for building elaborate closed networked queuing models that could then be solved analytically. Even though the entire modeling validation process was fraught with uncertainty, customers of these software tools greatly valued the accuracy of the predictive models they produced because there was little room for error when buying a multi-million dollar hunk of iron on a quarterly or annual basis. The vendors of these costly tools also reinforced the view that performance modeling and prediction were time-consuming processes. Since the vendors themselves trained most of the practitioners who would appear at venues like CMG, the result was a closed loop system, narrowly, but precisely focused on this problem domain.
Chapter 4, a discussion of the author's model of parallel processing scalability, is the book's centerpiece. It introduces what Gunther calls a Universal Scalability Model, an elaboration of the formula known as Amdahl's Law to which he adds another parameter, k, that represents coherency delay:

where p is the number of parallel processors, s represents the portion of the workload that must run serially, and C(p) is the effective capacity. The result is a polynomial function that is concave:

Amdahl's law famously entered the literature on parallel programming as an attempt to define the theoretical upper limits of the scalability of the parallel processing approach in order to discredit it. It predicts a diminished rate of return from adding more and more parallel processing engines (the dotted throughput curve in the chart above) based solely on the existence of portions of the executing program that unavoidably and inevitably must run serially. (The parameter s represents the serial portion of the program. In the curves drawn above s = 0.1, meaning only ten percent of the code path requires serial execution.) Amdahl's simple law has proved remarkably resilient over the years, despite the best efforts of researchers in the field of parallel programming committed to proving him wrong.
Gunther's modification is designed to be a better fit to the actual scalability data we observe in parallel processing environments. Not only does adding more parallel processing threads yield diminishing returns, at some point the throughput curve actually turns south and adding more parallelism actually starts to reduce the throughput and increase the execution time. Gunther labels this second parameter of the model coherence, which is an echo of the cache coherence overhead manifest at the processor hardware level when we build symmetric multiprocessors (SMPs), still the most widely available form of multiprocessor architecture.
Having introduced his scalability model in Chapter 4, Gunther takes the reader through several detailed examples of its application in Chapter 5 to parallel processing hardware benchmark data to demonstrate its merit. Readers who are unfamiliar with how to use the regression functions in Microsoft Excel will benefit from Gunther's step-by-step walk-through here. He illustrates the care the analyst must often take to transform the raw benchmark into useful data for analysis. The examples he provides also demonstrate dealing with missing or incomplete data, outliers, and other complications that the practitioner frequently confronts.
The author claims Chapter 6 deals specifically with the applicability of his Universal Scalability Model to software scalability, but I am not persuaded that he is doing anything unique or profound here. From its very conception, Amdahl's law is about the performance of software on parallel processing hardware while Gunther's amplification provides a better fit to the underlying empirical data. Chapter 6 reviews the scalability results from some parallel SPEC benchmark runs, and some artificial benchmarking data the author generated using the Benchmark Factory product with Microsoft SQL Server and, finally, refers to some published benchmark results using the Microsoft Web Application Stress (WAS) tool to stress test a three-tiered application (web front-end, component middleware, and database back-end). In each instance, he shows how his 2-parameter scalability equation can be adjusted to fit the throughput curves generated in each case accurately.
So far, so good. At this point in time, we have certainly accumulated ample evidence that Gunther's 2-parameter scalability equation model can be readily fitted to the throughput curves that measure the performance of parallel programs. There is little dispute here - Gunther's law is a concise representation of the scalability behavior that parallel programs typically exhibit, and I believe it is a very useful result. But it is also unclear what use the practical performance analyst (not coincidently the title of Gunther's first published book on the subject) can make of this result to solve a typical capacity planning problem, for example, sizing a computerized solution where very little is known up front about the extent to which the solution is able to be parallelized or many other of the performance characteristics of some or all of the major hardware and software components.
And it is here, perhaps, that Gunther mistakes the mission of the guerilla capacity planner to deal with these uncertainties with that of a theoretical physicist (which happens to be Neil's academic background) whose goal is to derive a graceful mathematical representation underlying some observable physical phenomenon. At issue here is that if we are already in possession of the benchmark scalability data for our application, we have very little need for graceful equations that can be fitted to those curves to figure out what hardware we have to buy. His modeling insights are valuable, but they would be even more valuable when they allow us to extrapolate from a current sample of the performance data to a future computing environment as yet unknown and perhaps even unspecified.
The reader is left at this point to wonder whether the author is merely showing off his considerable erudition, or he has some practical suggestions that can help the rest of us. For example, Amdahl's law was based on the observable behavior of an executing program with both parallel and serial phases. As we saw, the parameter s can be estimated directly by measuring the duration of the singled threaded phase of the program. Gene Amdahl's essential insight was that no degree of parallelism can ever possibly improve the execution time of the serial phase. One reason for the durability of Amdahl's law on the field of parallel processing is that it is clearly empirically grounded. Not so with Gunther's coherence parameter. He provides little insight into how we might go about measuring the coherence associated with a parallel workload. Yet this is exactly what us practical fellows would like to know from a theoretician.
Gunther's extension of Amdahl's law is about what happens when we attempt that massive parallelization effort. There is definitely some another element that arises that further limits the scalability of our applications. When software engineers attempt very fine-grained parallelism, for example, they encounter this limitation on scalability as a result of the need to join the parallel computational threads they originally spawned just prior to entering the program's penultimate serial component. In cases like this, there is a delay acquiring a lock or write barrier than guards the serial portion of the program that tends to grow exponentially with the number parallel threads contending for this lock. Gunther's coherence parameter k does account for this behavior mathematically, but he is silent on the rather crucial question facing the capacity planner of how to estimate the magnitude of this parameter other than using a post hoc curve fitting procedure after the benchmark data is available. It is not a fatal flaw, but it is something Gunther the theoretician should ponder.
Chapter 7 of the book is a version of the virtualization paper Neil presented in 2006 at the annual CMG conference in December. It is animated by his vision of a continuous virtualization spectrum whose performance can be characterized by the frequency with which the virtualization hypervisor polls the virtual machines it controls. This is in many ways a virtuoso performance that can be read for the sheer pleasure it affords into observing his agile and active mind at play. Unfortunately, the analysis itself is less than profound in several spots, again providing little of practical value to the professional capacity planner faced with the problem of sizing virtual servers. As I have written elsewhere, Gunther's analysis of the VMware ESX virtual machine scheduling algorithm focuses on what is probably the least significant performance aspect of the technology. His one big idea here, an analogy to the continuous electromagnetic spectrum, while clever, does not lead into any great insight into the performance of hyperthreaded processors at what Gunther calls the micro-level of the virtualization spectrum or the macro-level of grid computing. He fails to note in both cases the importance of the degree of concurrency in the underlying workload that is the main factor in determining how well the hardware solution performs.
Chapter 8 is outstanding. It is an extended capacity planning case study that begins by characterizing the traffic at a major (unnamed) web site and provides the book's clearest illustration of the guerilla tactics that Gunther espouses. He shows how to utilize a variety of free tools, including PERL scripts to massage the raw performance data these collectors produce. He then carefully walks the Reader through the creation of a multivariate regression model of the performance of a cluster of web servers using Excel that ingeniously accounts for latent CPU demand. Again, he is meticulous about detailing the specific steps the analyst must take to build a statistical model using Excel. It is one of the finest examples I know of that documents the capacity planner's art and science.
After the bravura performance of Chapter 8, the remainder of the book is a bit of an anti-climax. Chapter 9 is explores the potential for massive networks that rely on either peer-to-peer (P2P) computing protocols or multi-dimensional approaches to constructing computing grids like hypercubes. Not very many capacity planners need to confront scalability problems of this proportion, but it is certainly interesting to contemplate their solution. Chapter 10 is an excursion into characterizing Internet packet traffic that, ignoring as it does the actual queuing behavior of IP routers, seems largely academic. The final chapter of the book is a case study contributed by James Yaple, originally published here on MeasureIT, that Neil both motivated and inspired. Yaple's chapter discusses using the open source Orca solution developed for capacity planning that also leverages RRD, another powerful and popular open source tool. Gunther performs a valuable service by endorsing these highly effective open source tools. He also unwittingly highlights how little has changed since the days when a capacity analyst armed with little more than a copy of Barry Merrill's Extended Guide book and SAS source code faced down the hardware vendor with a deal to propose.
Gunther's book concludes with a lengthy appendix, which includes his Guerilla Manual, that is also printed in a portable version stuffed in a pouch inside the back cover. This contains a variety of aphorisms and witticisms on the subject of capacity planning, some trivial and some profound.
Gunther's book is an expert's guide to adapting successfully to the changing landscape of capacity planning, Given the unmistakable downward trend in CMG membership since the early 90s, this very successful practitioner's perspective on the future of capacity planning definitely needs to be heard. Anyone interested in pursuing a long-term career in this field should be interested in what has to say here on this vital subject.
In fairness, the guerilla tactics Gunther advocates are hardly as revolutionary in scope as Neil portrays them. If it sometimes seems like some model, any model, is good enough - and certainly better than having performed no analysis at all - some of Neil's more rhetorical flourishes on the subject are probably designed as a corrective to compensate for what he sees as excessive amounts of time devoted to model calibration in the past. While model calibration is de-emphasized, I don't believe Neil wants us to ignore the importance of modeling accuracy completely. In the presence of much speculative data on both the workload growth trajectory and the workload characterization, it is important not to stress too much over accuracy in order to turn around results quickly. After all, capturing the essential behavior, while leaving out a lot of the detail, is the whole point of the modeling exercise. And, where he illustrates using statistical methods on several fronts, the importance of calibration is cloaked in a pointed discussion of goodness-of-fit metrics.
Needless to say, I found Neil Gunther's Guerilla Capacity Planning to be an excellent read, filled with fascinating insights and useful commentary. It illustrates the extraordinary range of his nimble mind. It is a worthy addition to any capacity planner's bookshelf.
Do yourself a favor and pick up a copy.