CMG Home

Site Map Links Members Only National CMG Groups Measure IT International Conference

MeasureIT
 In This Issue
 
From the Editors

Articles >

Forecast Generation

I/O Virtualization

Measurement for Maturity (Part 2)

Capacity Utilisation

CMG News >

'07 Program Update

Press Release (05/31/2007)

Press Release (06/18/2007)

Region News >

Philadelphia

New York

Events >

Calendar

 Article Database
 Resources
 Industry Articles
 Submit Article
 SubscribeIT
 RemoveIT
 Letter to Editor
 About MeasureIT
 Contact Us
 
MeasureIT

Visualizing Virtualization
March, 2007
by Dr. Neil J. Gunther, Guest Editor

About the Author
Neil J. Gunther, Performance Dynamics ConsultingSM

Neil Gunther, M.Sc., Ph.D. is an internationally recognized consultant who founded Performance Dynamics Company (www.perfdynamics.com) in 1994. Prior to that, Dr. Gunther applied his training in theoretical physics to research and management positions at San Jose State University, JPL/NASA (Voyager and Galileo missions), Xerox PARC and Pyramid/Siemens Technology. His computer performance analysis and capacity planning classes have been given at both corporate and academic institutions including AOL, Boeing, FedEx, Motorola, Stanford University, Sun Microsystems, SAGE-Australia and Thales Group (Holland).

Dr. Gunther is the author of numerous papers on computer performance, as well as three books: THE PRACTICAL PERFORMANCE ANALYST, McGraw-Hill (1998), ANALYZING COMPUTER SYSTEM PERFORMANCE WITH PERL::PDQ, Springer-Verlag (2005), and GUERRILLA CAPACITY PLANNING, Springer-Verlag (2006). He is well-known to CMG (Computer Measurement Group) audiences for his presentations since 1993, and his very popular articles in the CMG MeasureIT online magazine. In 1996 Dr. Gunther was awarded Best Technical Paper at CMG and in 1997 he was nominated for the A.A. Michelson Award.

Performance Dynamics has recently embarked on joint research into QIT (Quantum Information Technology) and Dr. Gunther has developed a theory of "qubit bifurcation", which is being tested experimentally. Since Dr. Gunther has Dirac number 2 (his M.Sc. supervisor was Prof. C. J. Eliezer; one of Dirac's few research students) and his Ph.D. was awarded in the UK for studies in quantum field theory and phase transition phenomena, he is well-equipped to explore the new frontier between classical IT and QIT.

Dr. Gunther was born in Melbourne, Australia and is a member of the AMS, APS, ACM, CMG, IEEE, and INFORMS.

[Hide]

This issue of CMG MeasureIT focuses on the hot topic of Virtualization. It includes a selection of previously published MeasureIT articles, one paper reproduced from the CMG 2006 conference proceedings, as well as a late-breaking paper on consolidation under virtual machines in a Microsoft Windows environment. Each of these selected papers, along with some additional papers cited in the References section below *, highlights an important aspect of virtualization from the standpoint of performance analysis and capacity planning.

Virtualization is about creating illusions. Although this concept first appeared on mainframe computers (because they had the horsepower to support it, and still do), all computer systems today are now sufficiently powerful to present users with the illusion of a single physical machine appearing as multiple virtual machines (VMs). This multiplicity takes on a number of different guises, e.g., hyperthreaded virtual processors or guest operating systems (OS) running as VMs on a hypervisor OS, also called a virtual machine monitor (VMM). Many readers will already have been exposed to at least one type of VM. Less apparent to many readers is the notion that virtualized services or hyperservices, such as GRID computing and peer-to-peer (P2P) services like BitTorrent, also rely on VM-style architectures. I shall return to this point shortly.

Moreover, as evidenced by the presentations at CMG conferences over the past several years, each of these VM types is usually thought of as being quite distinct and unrelated. In fact, I held this same view until I came to write the chapter on the Fundamentals of Virtualization for my new book Guerrilla Capacity Planning. Based on some presentations at CMG 2004, I already suspected that hypervisors might be based on something called a fair-share (FS) scheduler [2]; a topic I had already discussed at CMG in 1999 [1]. Whereas a time-share (TS) scheduler provides each user with the illusion that they are the only user of the physical processors, FS provides each user (or group of users) with the illusion that they possess their own VM whose service rate is scaled according to the allocated resource entitlement [1]. The system administrator prorates entitlement by allocating different numbers of shares to different users and groups, just like owning equity shares in a corporation. The greater your share entitlement, the greater your maximal allowed resource consumption.

This hunch became reality while I was perusing the references in Gene Fernando's comprehensive CMG 2005 paper [3]. In particular, he cites an online document entitled "ESX Server Performance and Resource Management for CPU-Intensive Workloads" from VMware, Inc. The section on "Allocating CPU Resources via Shares" (starting on p. 14) explicitly discusses how the choice of share allocations can significantly impact the performance of a guest VM (the default allocation is 1000 shares per guest). They employ the 164.gzip benchmark code from the SPEC suite of integer-based benchmarks as the test workload. This is a purely CPU-intensive workload, but it also makes it easy to understand the impact of share allocations. I also knew that the FS scheduler uses a polling protocol to provide physical service for the allocated virtual services. The FS polling rate is typically on the order of five seconds [1].

Armed with this insight, I then went back and reviewed some earlier CMG articles on hyperthreading. These included two Measure IT articles; one by Mark Friedman entitled "Hyperthreading - Two For The Price Of One?" and another by Ellen Friedman entitled "Tales from the Lab: Best Practices in Application Performance Testing" where they discuss respectively how hyperthreading works and how difficult it is to measure. In addition, a CMG conference paper by Scott Johnson [4] provided valuable performance data generated by carefully controlled laboratory measurements using multi-threaded workloads. These data enabled me to build some elementary performance models in PDQ from which it was clear that some of the overhead in hyperthreading was also due to a polling, albeit at a much higher rate than is true for FS polling. If your application thread is sitting in the buffer that is currently not being serviced, it gets to wait. This appears to the OS as service-time stretching for that application [4]. It is noteworthy that Fernando [3] reports a similar effect in certain BMC Patrol 2000 data. Frustrated at being broadsided by the virtual, Fernando suggests disabling HTT if you are serious about server capacity planning. I came to call this effect the Missing MIPS paradox [5]. More on this in a minute.

Generalizing this insight led me to construct a unified framework; the VM Spectrum [5], by which the variety of VMs could classified. The continuous electromagnetic spectrum can be broken into three primary regions: the visible region (VR) with frequencies that our eyes respond to, the ultraviolet region with radiation frequencies much higher than our eyes can detect, and the infrared region with frequencies much lower than our eyes can detect. This choice of regions is arbitrary in that it is biased by the fact that we see. Similarly, the VM spectrum can be defined in terms of three primary regions:

  1. Micro region (like UV) involving high-frequency polling VMs such as hyperthreading, e.g., Xeon processor

  2. Meso region (like VR) involving intermediate-frequency polling VMs such as hypervisors, e.g., VMware and Xen

  3. Macro region (like IR) involving low-frequency polling VMs such as hyperservices, e.g., GRIDs and P2P

The implication that Meso-VMs are somewhat more "visible" to the capacity planner is intentional in that there are more tuning knobs available through features like share allocation, whereas Micro- and Macro-VMs remain "invisible" in the sense of black boxes. One difference, of course, is that the VM spectrum is discrete rather than continuous because each VM type possesses its own characteristic polling frequency. A polling system is already less efficient than a simple queueing system. Those of you who have taken my classes will recall that I often introduce queueing concepts using the familiar example of a grocery store checkout. The cashier is the server and customers form a single waiting line at the checkout to have their groceries rung up. A more efficient queueing system has multiple cashiers servicing the single waiting line because this introduces a weak form of parallelism. I know of a Safeway store in Melbourne, Australia that implements a mutliserver queue with six cashiers for its Express Lane. By analogy, a polling system is more like a grocery store with only one cashier to service all the checkout stands! Sound insane? Well, it can make sense for the case where, e.g., each customer only has one item to check out. In fact, most operating systems implement round-robin priority queues in this way, and polling is used by certain high-speed packet switches [6]. The important point here is to recognize that polling represents another dimension for performance trade-offs in VMs.

Peg McMahon's MeasureIT article discusses the difficulties of capacity planning for GRID hyperservices. It is likely (and hoped) that we will see more CMG papers on the performance management of Macro-VM hyperservices (e.g., so-called Service Oriented Architectures [7]) in the future. To make life even more exciting for system analysts and capacity planners, it is possible to build Macro-VMs that run on top of Meso-VMs, which run on top of Micro-VMs. Reminds me of the old rhyme: Big bugs have little bugs upon their backs to bite 'em. And little bugs have littler bugs and so ad infinitum. This can really mess with your head if you do not appreciate the commonalities between these VMs as defined by the VM spectral regions. In fact, Meso-on-Micro is probably the most ubiquitous VM configuration discussed (so far) in CMG presentations on virtualization.

The Missing MIPS paradox [3,4] can now be understood in terms of how Micro-VMs poll for threads under certain conditions. Hyperthreading, also known as Hyper-Threading Technology (HTT) or Simultaneous Multi-Threading (SMT) in Intel parlance, is primarily a way to saturate a single physical execution unit (EU) by soaking up any remaining idle cycles. A processor like the Xeon, has two ports (AS registers in Intel parlance) available to the same EU. When HTT is disabled, only one AS register is accessible to the OS run-queue, so TS scheduling works the same way as it does for a single physical CPU with time-slicing. However, when HTT is enabled, the OS has to know how to schedule work onto both AS registers. These two registers act like 1-deep thread buffers. Provided different software applications are appropriately threaded (and that's often a big assumption), one set of application threads can be scheduled onto to one of the AS buffers (say AS0), while another set of application threads can be scheduled onto the other AS buffer (say AS1). When a thread stalls on AS0, the EU would normally become idle, but with HTT enabled the EU can service the AS1 buffer until the AS0 becomes ready again. That is analogous to the single cashier switching between two checkout stands in a grocery store. Actually, Intel does not tell us what the exact scheduling discipline is inside the Xeon. Anyway, this is all well and good if you're on AS0, but not so wonderful if you are on AS1 because you may spend a lot of time waiting for AS0 to stall. Moreover, from the standpoint of capacity analysis for servers using Xeon parts, the OS gets fooled into thinking AS0 and AS1 represent two virtual CPUS or VPUs [5] which potentially offer twice the CPU capacity of a non-HTT processor, viz., 200%. In reality, this 200% capacity cannot be realized because the physical EU is generally more than 50% busy. Best case controlled measurements [4] show that the EU only has about 25% idle cycles (1/4 × 100%) available to service register AS1, and therefore the OS never sees more than 2 × 75%  = 150% (virtual) capacity consumed. As reported in many CMG presentations, this virtual arithmetic (propagated to performance tools via the OS) has often led system analysts and capacity planners on a wild goose chase looking for the remaining 1/4 × 200% = 50% of virtual cycles, which never exisited in the first place.

Similar effects have been measured on Meso-VMs [5,8,9]. The plot on the left shows controlled measurements for a WebLogic-J2EE production application performed by my colleagues at RSA Security. It is quite apparent that approximately 25% of the expected throughput (X) is missing, relative to PDQ and that, in turn, is due to the knee occurring at N = 6 threads running rather than the expected N = 8 threads running. The platform was a Dell PowerEdge 1750 with dual 3 GHz Xeon processors. This effect was isolated to listen-thread contention in WebLogic. Dual Xeons (i.e., 2 EUs) with HTT enabled is equivalent to 4 VPUs. If there were 2 threads per VPU, we would expect to see 8 listen-threads executing. In fact, only 6 threads appear to be executing i.e., 2−ports × 150% = 3 VPUs instead of the expected 2−ports × 200% = 4 VPUs. Once again, we see the "missing" 50% signature. (Details can be found in [5]). Salsburg, Karnazes and Maimone employ the 164.gzip benchmark to measure the processor overhead of a VMware hypervisor under various settings running on an 8-way Unisys ES7000-540-G3 using 3 GHz Xeons. As mentioned earlier, this is the same CPU-intensive benchmark workload used by VMware Inc. to discuss the performance impact of share allocations on their ESX Server. The interested reader should compare these two reports. The original Xen development team, in addition to constructing an FS scheduler (see Section 3 in [8]), performed a number of controlled measurements on both Xen and VMware hypervisors using workloads ranging from Linux builds to OLTP database benchmarks (see Section 4 in [8]). For general applications that invoke significant amounts of I/O, memory and network activity, their results show clearly that the overheads can be far greater than the simple Missing MIPS problem for CPU-intensive applications [9]. Thus, from a performance perspective, server consolidation using Meso-VMs may be regarded either as a many-to-one advantage or a many-headed hydra.

Finally, the question arises: should you adopt VM technologies or not? As Friedman points out [9], you need to remain cognizant that VMs can introduce significant performance penalties. On the other hand [5], you may still choose to implement your applications on Micro-VMs or Meso-VMs for reasons other than performance, e.g., improved security enforcement, power reduction or just satisfying internal politics. McMahon astutely remarks that users need to apply pressure on all the commercial vendors to make more VM performance statistics available to the OS and to the human system analyst. Indeed, this situation is slowly starting to improve with the advent of internal hardware-state information such as the PURR register in the IBM Power5 processor. PURR stands for Processor Utilization Resource Register. Hardware registers of this type will help to ameliorate performance conundrums like the Missing MIPS problem in Micro-VMs. Additional instrumentation is still needed at the Meso-VM and Macro-VM levels to help elucidate where service times are being stretched between the guest VM and the hypervisor running on the physical platform. The message to commercial VM vendors is clear:

Constructing illusions by hiding physical information from users is one thing, but propagating that illusion to the system analyst by hiding vital performance information is considered harmful and ultimately bad for business.
Perhaps their watchword should be fewer bells, more whistles.

I hope that the central concept of polling and the organization of the VM spectrum that follows from it will help you to better appreciate the presentations of the CMG authors compiled here and, even better, help you to write your own paper on virtualization for CMG 2007.

References

[1]
N. J. Gunther, "Capacity Planning for Solaris Resource Manager: All I Ever Wanted was My Unfair Advantage (And Why You Can't Get It!)," CMG Proceedings (on CD), Reno, Nevada, 1999

[2]
J. Kay and P. Lauder, "A Fair Share Scheduler," Communications of the ACM, 31, 44-55, 1988

[3]
G. Fernando, "To V or Not to V: A Practical Guide To Virtualization," CMG Proceedings (on CD), Orlando, Florida, 2005

[4]
S. Johnson, "Measuring CPU Time from Hyper-Threading Enabled Intel Processors," CMG Proceedings (on CD), Dallas, Texas, 2003

[5]
N. J. Gunther, "The Virtualization Spectrum from Hyperthreads to GRIDs," CMG Proceedings (on CD), Reno, Nevada, 2006

[6]
N. Gunther, K. Christensen and K. Yoshigoe, "Characterization of the Burst Stabilization Protocol for the RR/CICQ Switch," IEEE Conference on Local Computer Networks, October 20-24, Bonn, Germany, 2003

[7]
A. W. Shum and J. P. Buzen, "Achieving Business Agility with SOA: Governance and SLA Management of Shared Service Ecosystems," CMG Proceedings (on CD), Reno, Nevada, 2006

[8]
P. T. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield, "Xen and the Art of Virtualization," SOSP (ACM Symposium on Operating Systems Principles), 164-177, 2003

[9]
M. Friedman, "The Reality of Virtualization for Windows Servers," CMG Proceedings (on CD), Reno, Nevada, 2006

 

Last Updated 03/20/07


Home | Conference | Groups | National | Members | Links | Site Map

Computer Measurement Group