Moore's Law: More or Less?

May, 2007
by Neil J. Gunther

About the Author
Neil J. Gunther, Performance Dynamics ConsultingSM

Neil Gunther, M.Sc., Ph.D. is an internationally recognized consultant who founded Performance Dynamics Company (www.perfdynamics.com) in 1994. Prior to that, Dr. Gunther applied his training in theoretical physics to research and management positions at San Jose State University, JPL/NASA (Voyager and Galileo missions), Xerox PARC and Pyramid/Siemens Technology. His computer performance analysis and capacity planning classes have been given at both corporate and academic institutions including AOL, Boeing, FedEx, Motorola, Stanford University, Sun Microsystems, SAGE-Australia and Thales Group (Holland).

Dr. Gunther is the author of numerous papers on computer performance, as well as three books: THE PRACTICAL PERFORMANCE ANALYST, McGraw-Hill (1998), ANALYZING COMPUTER SYSTEM PERFORMANCE WITH PERL::PDQ, Springer-Verlag (2005), and GUERRILLA CAPACITY PLANNING, Springer-Verlag (2006). He is well-known to CMG (Computer Measurement Group) audiences for his presentations since 1993, and his very popular articles in the CMG MeasureIT online magazine. In 1996 Dr. Gunther was awarded Best Technical Paper at CMG and in 1997 he was nominated for the A.A. Michelson Award.

Performance Dynamics has recently embarked on joint research into QIT (Quantum Information Technology) and Dr. Gunther has developed a theory of "qubit bifurcation", which is being tested experimentally. Since Dr. Gunther has Dirac number 2 (his M.Sc. supervisor was Prof. C. J. Eliezer; one of Dirac's few research students) and his Ph.D. was awarded in the UK for studies in quantum field theory and phase transition phenomena, he is well-equipped to explore the new frontier between classical IT and QIT.

Dr. Gunther was born in Melbourne, Australia and is a member of the AMS, APS, ACM, CMG, IEEE, and INFORMS.

1  The Big Announcement

For the past few years Intel, AMD, IBM, Sun, and other microprocessor vendors have been aggressively promoting the concept of multicores. Multicore processors contain multiple executions units or cores on the same silicon die and are packaged as a single module. The message has been that more cores are better than a single core or CPU. Last January, however, Intel and IBM made a highly unorthodox joint announcement. What makes this announcement unorthodox is that IBM and AMD have formed a partnership in the manufacture of future microprocessors, so IBM and Intel are competitors in this context. In case you missed it, here's what they said in a nutshell. Intel and IBM will produce single CPU parts using state-of-the-art 45 nanometer (nm) technology. Intel said it is converting all its manufacturing facilities (fab lines, in the vernacular) and will produce 45 nm microprocessors (code named penryn) by the end of this year. The IBM and AMD schedule is a bit less aggressive but they expect to ship their respective Power series and Opteron series microprocessors by the middle of 2008.

Add to this picture the running commentary that Moore's law (see Section 3) is finished, kaput, dead (even according to Gordon!), or possibly risen from the dead. Any time commercial enemies get together for a public lovefest in such a volatile marketing milieu, it warrants closer scrutiny because it usually means there is something major going on. When I saw this announcement, some of the questions that went through my mind were:

  1. Why was a joint announcement necessary?

  2. How can Intel upgrade all their silicon wafer fabrication lines so fast?

  3. Why would Intel put all their silicon eggs in the same fabrication basket?

  4. Does this mean multicores are already a thing of the past?

  5. Has Moore's law been resuscitated?

Since I have some past experience with VLSI technology from the time when I was a researcher at Xerox PARC, I thought you might find my speculative thoughts on these matters of some interest; especially when you come to do your next round of server procurement. I have to say speculative because I don't have any particular insider information, although I do still hear a lot of rumors on the Silicon Valley grapevine.

The answers to these and other questions are very likely to impact the way we do server performance analysis and capacity planning in the near future.

2  Mr. Moore Meets Mr. Amdahl

Before I get into dissecting the above issues, I want to emphasize why the joint Intel/IBM announcement is so important. It is important, not just from the marketing standpoint, but also for computer performance analysis, and in particular, how it relates to another well-known law viz., Amdahl's law. In case you're confused, the conclusion of Gene Amdahl's 1967 paper can be paraphrased as:

When it comes to running commercial workloads on a single CPU or a multiprocessor, the best performance will be achieved by running it on the fastest single CPU you can get your hands on.

Skeptics are invited to see Section 2.10 of my Perl::PDQ book for a more formal discussion on this point. Gene Amdahl never said anything about parallelism (in the sense we mean it today) and he never wrote down the speedup equation that now bears his name. BTW, it's pure marketing genius to get your name attached to an equation you never wrote! Ironically, all of those appellations were incorrectly attributed by others in subsequent years.

Like Moore's law, Amdahl's law was based on purely empirical evidence. At that time, Amdahl was trying to dissuade people from using multiprocessors, partly because they were more expensive for Amdahl Corporation to build. Amdahl was correct about that then, and he's correct about it today.

Table 1: A selection of current generation Intel and AMD multicores.

Intel Corporation Advanced Micro Devices
Series Model  Cores   Clock (GHz)   Fab (nm)  Series Model  Cores   Clock (GHz)   Fab (nm) 
 Core 2 Duo E6300 2 1.83  65   Athlon 64 X2  4400 2 2.3  65 
 Core 2 Duo E6700 2 2.66  65   Athlon 64 X2  6000 2 3.0  90 
 Core 2 Quad Q6600 4 2.40  65   Athlon 64 FX-70 4 2.6  90 
 Core 2 Extreme   QX6700  4 2.66  65   Athlon 64  FX-74  4 3.0  90 

Multiprocessors that are familiar to us as servers in air-cooled boxes are still expensive to manufacture and engineer for reliability. A tremendous amount of effort goes into removing race-conditions and faults in the hardware state-machines, not to mention symmetrizing the operating system to take advantage of the multiple processors in a scalable way. Multicores, like those shown in Table 1, can be thought of as multiprocessor architectures shrunk down onto a silicon die and contained in a single "black box"; the multicore module. Placing too many high-speed cores in the same module can lead to overheating problems. A burning question (if I can use that phrase) for the 45 nm technology is, How will the clock speeds of the new multicores compare with the speed of the fastest single processors? (See Section 5)

3  Moore's Exponential Law

Moore's law states:

The number of transistors that can be fabricated on a very large-scale integrated (VLSI) chip doubles every two years.

Just like credit card debt this is a statement about compounded or exponential growth (Figure 1), so the ramifications can be monumental. Gordon Moore of Intel Corporation made his empirically based pronouncement about 40 years ago. In that time, the area of a silicon chip has remained essentially constant; about the size of postage stamp. Therefore, to accommodate the 2 year doubling of transistor density predicted by Moore's law, the size of the transistors and their associated "wires" have to be shrunk accordingly. This is possible due to the fact that the transistor fabrication process is based largely on something called photolithography. Photolithography is a highly specialized form of photography where the transistors and their associated circuits are built up in layers on a silicon wafer substrate. Each layer is imaged or shot onto the silicon wafer in much the same way an image is formed on a photographic plate. Each image is actually a special kind of mask that lets light through in certain regions and not in others. Between each shot, the silicon wafer is cleaned and prepared for the next layer. Intel and IBM use slightly different lithographic processes from one another, but it requires about 50 photolithographic shots of this type to build up today's generation of microprocessors. The key point is, to double the number of transistors on the chip, each mask image only needs to be shrunk by a factor of two.

Figure 1: Moore's law for memory chips and microprocessors plotted on a semi-logarithmic scale, which has the effect of making nonlinear exponential curves appear linear. The uppermost purple curve is the Moore projection based on data up to 1975; note the kink correction around 1980, which shows that the so-called law is only an approximation. [Source: Intel Corporation]

In 1985 the feature size was around 1000 nanometer (nm). Today, Intel and other chip makers are fabricating in with 65 nm feature sizes. For reference, the wavelength of visible light is about 500 nm, ultraviolet light (used in the transistor fabrication process) is about 200 nm, and the HIV virus is about 100 nm in diameter. Obviously, Moore's shrinkage law cannot go on indefinitely because the scale of an atom is finite in the range of 0.1-0.5 nm, and the fabrication process used to make transistors today cannot be applied to make transistors out of individual atoms.

4  Fabulous Fab Lines

Intel has several silicon chip fabrication facilities or fab lines in Oregon, Arizona, Israel, and elsewhere. Hillsboro, Oregon is a research fab line, rather like a "test kitchen" where new silicon fabrication recipes are dreamt up. These fab lines are amongst the cleanest rooms in the world. Your camera, for example, is too "dirty" to gain entry. Each is the area of three-football fields and costs about 3 billion dollars to set up. This astronomical cost represents a huge barrier-to-entry for any new vendor wanting to get into the game, and it also explains why there are fewer and fewer chip vendors year after year. Over and above the shear cost of the fab line, is the time to design and develop a new microprocessor, which is currently about 5 years.

All of Intel's facilities are fully automated. By that I mean, the robots have taken over. Humans are no longer involved in silicon processing anymore like in the good old days (just 20 years ago). In fact, the picture on the right reminds me of something out of Kubrick's movie 2001 A Space Odyssey. A robot transports the wafers, each of which is 300 mm in diameter (about the size of dinner plate), between processing steps via an overhead monorail track system.

Each processing step is usually carried out in a separate machine. A robot transfers the wafer-carrier from one processing machine to the next machine and so on. Any of these processes can also have their parametric settings altered on the fly by process engineers using computers outside the clean room. That means there is no guarantee that any two wafers have received identical processing. In fact, there is a numeric zoo for AMD and Intel microprocessors which try to identify microprocessor chips that received different processing. Processing includes such things as ion implantation (doping), photoresist application and removal, oxide deposition, and most importantly, metal deposition. That's what produces the metal conductors (wires) that interconnect the transistors and circuits. To accommodate all the complex crisscrossing of the interconnect network, there are 8 metal layers using copper interconnect (M1,M2,...,M8) on the new microprocessors. The bottom layer M1 connects transistors to transistors, while the upper layers are used to connect circuits spanning the chip.

Table 2: Major technology nodes predicted by the SIA 2006 roadmap.

 Year   2004   2007   2010   2013   2016   2020 
 nm  90 65   45   32 22 14

Each progression in shrinking the technology is identified by a waypoint or "node" (Table 2) on the SIA (Semiconductor Industry Association) roadmap. A technology node is defined primarily by the minimum metal pitch used on any product, for example, the Metal-1 layer (M1) half-pitch in a microprocessor. Here, pitch refers to the spacing between wires. You can see that IBM and Intel are well ahead of the SIA roadmap; another reason the announcement in Section 1 was newsworthy. This also explains those numbers in the Fab (nm) column in Table 1. A lot of people quote those numbers, but very few know where they come from. Now you know too. Apparently, Freescale Semiconductor, Inc. has not yet found a way to get below 90 nm reliably. Since they were "fabbing" the PowerPC G5-microprocessor for the Apple Macintosh, that delay in moving from the 2004 SAI technology node to the 2007 technology node was leaving the Mac behind in the CPU-speed sweepstakes. That's partly why Apple Inc. had to cross over to Intel microprocessors.

Moore's law also captures the amazing scale of cost reduction. It explains why you can buy what was a 10,000 MIPS supercomputer a decade ago and have it in your laptop today for a few hundred dollars. From the performance perspective, however, Moore's law is more often associated with increasing MIPS. To understand how Moore's law has run into trouble at 45 nm, we need to understand something about how CMOS transistors work. CMOS stands for Complimentary Metal-Oxide Semiconductor and I'll come back to more details about that in Section 6.

How does a transistor work? Imagine your garden hose with the water turned on at a moderate flow from the tap to the nozzle, which is lying in a drainage ditch. The water flowing in the hose is analogous to a current of electrons between a source (analogous to the tap) and a corresponding sink (analogous to the drain). In fact, two of the terminals in a transistor are called the source (labeled S in Figure 2) and the drain (labeled D in Figure 2). Part of the role of a transistor is to control the current flow between the source and drain. When using your garden hose, you normally control the flow of water at the tap (the source). The CMOS transistor does not work that way. Instead, it is more like controlling the water flow by pressing your foot on the hose to stop it (off) and lifting your foot to start it (on). You are likely to be more effective if you are wearing shoes. The transistor terminal that corresponds to your shoe is called the gate and it sits over the top of the source and drain terminals, just like the situation with the hose. The three terminals together form a triode valve, which is the silicon version of the glowing glass tubes you can see in the back of any TV set that is more ten years old.

Figure 2: Schematic comparison of the standard transistor gate and the new high-K metal gate (shown in blue).

Depending on how the transistor is configured in a circuit, when you push you foot down the current is turned off and that might correspond to a digital `0'. Similarly, when you lift your foot up that might correspond to a digital `1'. This explains how Moore's law is associated with processor speed. As the transistor geometry becomes smaller, the distance between source and drain gets shorter and fewer electrons need to flow under the gate. A major hurdle that has to be overcome in scaling down from 60 nm to 45 nm technology is leakage of electrons between the source and drain. By analogy with the garden hose, trying to stop the water flow with a conventional shoe is no longer effective at these tiny scales, so the difference between a digital `1' and `0' becomes less distinct, and this can cause all sorts of subtle problems. To really pinch the hose tightly, you need to wear metal-capped boots. And very special metal at that, called Hafnium. There are many other special fabrication tricks-of-the-trade required to make the 45 nm fab technology operational, but they need not concern us here.

5  Moore Suffers Heat Stroke

We fell off the Moore's law curve, not because photolithography collided with limitations due to quantum physics or anything else exotic, but more mundanely because it ran into a largely unanticipated thermodynamic barrier. In other words, Moore's law was stopped dead in its tracks by old-fashioned 19th century physics. As CMOS feature sizes are reduced and switching speed increased, thermal power dissipation becomes a serious problem because it is directly proportional to the clock frequency used on the chip. A typical 2-3 GHz CPU generates on the order of 100 Watts (same as a household light bulb). And don't forget the huge power transients at the pins of the chip, which are essentially scale-invariant. Moreover, you end up trying to push that heat through a pinhole: the small die size. In other words it's really the dissipation of power density that kills you. To the degree that this power density cannot be dissipated, the chip failure rate increases dramatically due to thermal degradation. The Apple Mac G5 dual processor (IBM Power 5 chip) has a freon cooling system (a la Cray) together with a spectrophotometer that detects freon leaks. If such a leak is detected, the system shuts down immediately. This is another reason Apple went Intel. Some designers have proposed hetero-speed cores to keep thermal dissipation problems under control. General workloads would run on moderate speed cores and the highest speed cores would only run under specific compiler directives. This is yet another rendition of Amdahl's law, as described in Section 2. As I understand it, this thermal barrier is the reason why multicores have been so heavily promoted by the CPU vendors; keep the cores running at about current clock speeds and compensate for the absence of higher speed by adding more cores per module to provide more aggregate MIPS. Of course, this looks good on paper, but it comes with it's own set of problems, which those of us who have been involved with the development of SMPs are familiar with. Quite apart from Moore's law having been thermally defeated, many application programmers have become rather addicted to the thrill of riding Moore's exponential curve. Even a purely single-threaded program will run faster without any additional programming effort. But what happens when you have to run threads across multiple cores? Welcome to the return of concurrent programming as a major performance issue for applications in the foreseeable future. Been there, done that, 15 years ago. But this time it's worse, because the cores are inside a true black-box; the module. Without appropriate hardware registers, any serious performance tuning will likely have to be accomplished in software alone.

6  M is for Metal (Not Moore)

In an ironic twist of fate, the new CMOS transistor technology actually hearkens back to the earliest transistor implementations. When I was involved with VLSI design tools at Xerox PARC, we used the Mead-Conway design rule: "poly over silicon" produces a transistor. The word poly refers to polysilicon or amorphous silicon. The Mead-Conway rule was actually shorthand for a layer of poly-Si over an implied silicon dioxide insulator over a doped, over a Si substrate with implied source and drain, leads to a transistor being produced from the photolithographic masks generated by the VSLI CAD tool. Now, recall from Section 4 that the `M' in CMOS stands for metal. That terminology is already a throwback to the days when transistor gates were made by depositing metal (usually Aluminum) rather than poly-silicon over the silicon substrate. So it's somewhat ironic that, as part of the joint announcement, Gordon Moore was trotted out to publicly proclaim:

"The implementation of high-k and metal gate materials marks the biggest change in transistor technology since the introduction of polysilicon gate MOS transistors in the late 1960s"

I should point out here that the Hafnium metal is used in the so-called high-k gate oxide layer (the yellow strip in Figure 2), not the metal gate itself. Neither Intel nor IBM disclosed what type of metals will be used in the gate. The key issues at each technology node defined in Table 2 are usually the same:

  • Yield. How many chips actually operate to spec when the wafer comes out of the fab line?

  • Process variation. As the scale gets smaller, consistency of process control also decreases.

  • Thermal power dissipation. See Section 5.

Each of these issues comes into play with different levels of significance at each SIA node. In fact, it turns out that Moore's law has died many little deaths (le petit Moore?) since he first proposed it (but people have short memories). Indeed, Moore's remark could be interpreted as an acknowledgment that the transition to 45 nm was closer to a near death experience than anyone has witnessed before.

Take a look at the picture on the right. No, that's not a Google Earth image of a football field at the edge of town, that's a photograph of the Penryn microprocessor chip. This means it's real! The claims for the new Hafnium-oxide/metal-gate technology include:

  • Approximately 2x improvement in transistor density, for either smaller chip size or increased transistor count

  • Approximately 30% reduction in transistor-switching power

  • Greater than 20% improvement in transistor-switching speed

  • More than 5x reduction in source-drain leakage

  • Greater than 10x reduction in gate-oxide leakage

It remains to be seen whether these attributes really translate into the so-called Moore's law II curve and lead to faster, cooler single processors or just somewhat faster and somewhat cooler multicores.

7  Conclusion

Finally, let's see if I can answer my own questions.

Why a joint announcement?

Announcing with IBM (the actual fab vendor) draws less of the wrong kind of attention than announcing with AMD who is the direct competitor. Also, it seems that both IBM and Intel benefitted directly from VLSI research performed several years earlier at Sematech. A joint announcement tends to dissipate any feud over who discovered what processing tricks first. Do I smell a lot of cigar smoke and back room deals here?

How can Intel upgrade all their fab lines so fast?

This question has a rather startling answer. Since all of Intel's fab facilities are fully automated and software controlled, the correct processing parameters and machine schedules are more or less uploaded from their Hillsboro research fab. Think about that; Intel will upload an entire factory!

Why put all your silicon eggs in the same fab basket?

I expect this is primarily a response to the tremendous competitive pressure Intel finds itself under. Since they have little to lose, they're going for broke. Let's hope there are no bugs in that fab software they upload from Hillsboro.

Are multicores dead?

Here's where I suspect Intel is not going to put all their eggs in the same basket. They don't have to. They can keep the multicore option open in case the expected efficiencies at 45 nm don't pan out as expected or Moore's law dies another little death.

Has Moore's law been resuscitated?

To answer this question, we'll just have to wait and see. But the wait is only about 6 months, according to Intel.

"No exponential is forever, but we can delay 'forever'."—Gordon Moore