MeasureIT – Issue 5.05 – Moore’s Law: More or Less? by Neil J. Gunther

Moore's Law: More or Less?


May, 2007

by Neil J. Gunther





About the Author



1  The Big Announcement



For the past few years Intel, AMD, IBM, Sun, and other microprocessor vendors have been
aggressively promoting the concept of multicores.
Multicore processors
contain multiple executions units or cores on the same silicon die and are packaged
as a single module. The message has been that more cores are better than a single core or CPU.
Last January, however, Intel and IBM made a highly unorthodox
joint announcement.
What makes this announcement unorthodox is that IBM and AMD have formed a
partnership
in the manufacture of future microprocessors, so IBM and Intel are competitors in this context.
In case you missed it, here's what they said in a nutshell.
Intel and IBM will produce single CPU parts using state-of-the-art
45 nanometer (nm) technology. Intel said it is converting all its manufacturing facilities
(fab lines, in the vernacular) and will produce 45 nm microprocessors (code named penryn) by the end
of this year. The IBM and AMD schedule is a bit less aggressive but they expect to ship their respective
Power series and Opteron series microprocessors by the middle of 2008.




Add to this picture the running commentary that Moore's law (see Section 3) is
finished, kaput, dead
(even according to Gordon!),
or possibly risen from the dead.
Any time commercial enemies get together for a public lovefest in such a volatile marketing milieu,
it warrants closer scrutiny because it usually means there is something major
going on. When I saw this announcement, some of the questions that went through my mind were:


  1. Why was a joint announcement necessary?


  2. How can Intel upgrade all their silicon wafer fabrication lines so fast?


  3. Why would Intel put all their silicon eggs in the same fabrication basket?


  4. Does this mean multicores are already a thing of the past?


  5. Has Moore's law been resuscitated?


Since I have some past experience with VLSI technology from the time
when I was a researcher at Xerox PARC, I
thought you might find my speculative thoughts on these matters of some
interest; especially when you come to do your next round of server procurement. I
have to say speculative because I don't have any particular
insider information, although I do still hear a lot of rumors on
the Silicon Valley grapevine.




The answers to these and other questions are very likely to impact the way we do server
performance analysis and capacity planning in the near future.




2  Mr. Moore Meets Mr. Amdahl


Before I get into dissecting the above issues, I want to emphasize why the joint Intel/IBM announcement is
so important. It is important, not just from the marketing standpoint, but also for computer performance analysis,
and in particular, how it relates to another well-known law viz., Amdahl's law.
In case you're confused, the conclusion of Gene Amdahl's 1967 paper can be paraphrased as:



When it comes to running commercial workloads on a single CPU or a multiprocessor, the best
performance will be achieved by running it on the fastest single CPU you can get your hands on.



Skeptics are invited to see Section 2.10 of my
Perl::PDQ book
for a more formal discussion on this point.
Gene Amdahl never said anything about parallelism (in the sense we mean it today)
and he never wrote down the speedup equation that now bears his name.
BTW, it's pure marketing genius to get your name attached to an equation you never wrote!
Ironically, all of those appellations were incorrectly attributed by others in subsequent years.


Like Moore's law, Amdahl's law was based on purely empirical evidence. At that time, Amdahl was
trying to dissuade people from using multiprocessors, partly because they were more expensive
for Amdahl Corporation to build. Amdahl was correct about that then, and he's correct about it
today.




Table 1: A selection of current generation Intel and AMD multicores.




Intel Corporation Advanced Micro Devices
Series Model  Cores   Clock (GHz)   Fab (nm)  Series Model  Cores   Clock (GHz)   Fab (nm) 
 Core 2 Duo E6300 2 1.83  65   Athlon 64 X2  4400 2 2.3  65 
 Core 2 Duo E6700 2 2.66  65   Athlon 64 X2  6000 2 3.0  90 
 Core 2 Quad Q6600 4 2.40  65   Athlon 64 FX-70 4 2.6  90 
 Core 2 Extreme   QX6700  4 2.66  65   Athlon 64  FX-74  4 3.0  90 



Multiprocessors that are familiar to us as servers in air-cooled boxes are
still expensive to manufacture and engineer for reliability. A tremendous amount of effort goes into
removing race-conditions and faults in the hardware state-machines, not to mention symmetrizing the
operating system to take advantage of the multiple processors in a scalable way.
Multicores, like those shown in Table 1, can be thought of
as multiprocessor architectures shrunk down onto a silicon die and contained in a single "black box";
the multicore module. Placing too many high-speed cores in the same module can lead to overheating
problems. A burning question (if I can use that phrase) for the 45 nm technology is,
How will the clock speeds of the new multicores
compare with the speed of the fastest single processors? (See Section 5)



3  Moore's Exponential Law


Moore's law states:



The number of transistors that can be fabricated on a very large-scale integrated (VLSI) chip doubles every two years.


Just like credit card debt this is a statement about compounded or exponential growth (Figure 1),
so the ramifications can be monumental. Gordon Moore
of Intel Corporation made his empirically based pronouncement about 40 years ago.
In that time, the area of a silicon chip has remained essentially constant; about the size of postage stamp.
Therefore, to accommodate the 2 year doubling of transistor density predicted by Moore's law,
the size of the transistors and their associated "wires" have to be shrunk accordingly.
This is possible due to the fact that the transistor fabrication process is based largely on something
called photolithography. Photolithography is a highly specialized form of photography where the
transistors and their associated circuits are built up in layers on a silicon wafer substrate.
Each layer is imaged or shot onto the silicon wafer in much the
same way an image is formed on a photographic plate. Each image is actually a special kind of mask that
lets light through in certain regions and not in others.
Between each shot, the silicon wafer is cleaned and
prepared for the next layer. Intel and IBM use slightly different lithographic processes from one another,
but it requires about 50 photolithographic shots of this type to
build up today's generation of microprocessors. The key point is, to double the number of transistors
on the chip, each mask image only needs to be shrunk by a factor of two.








Figure 1: Moore's law for memory chips and microprocessors plotted on a
semi-logarithmic scale, which has the effect of making
nonlinear exponential curves appear linear. The uppermost purple curve is the Moore
projection based on data up to 1975; note the kink correction around 1980, which shows that the so-called law
is only an approximation. [Source: Intel Corporation]




In 1985 the feature size was around 1000 nanometer (nm). Today, Intel and other chip makers are
fabricating in with 65 nm feature sizes. For reference, the wavelength of visible light is about 500
nm, ultraviolet light (used in the transistor fabrication process) is about 200 nm, and the HIV
virus is about 100 nm in diameter. Obviously, Moore's shrinkage law cannot go on indefinitely
because the scale of an atom is finite in the range of 0.1-0.5 nm, and the fabrication process used
to make transistors today cannot be applied to make transistors out of individual atoms.



4  Fabulous Fab Lines



Intel has several silicon chip fabrication facilities or fab lines in Oregon, Arizona, Israel, and
elsewhere. Hillsboro, Oregon is a research fab line, rather like a "test kitchen" where new
silicon fabrication recipes are dreamt up. These fab lines are amongst the cleanest rooms in the
world. Your camera, for example, is too "dirty" to gain entry. Each is the area of three-football
fields and costs about 3 billion dollars to set up. This astronomical cost represents a huge
barrier-to-entry for any new vendor wanting to get into the game, and it also explains why there are fewer
and fewer chip vendors year after year. Over and above the sheer cost of the fab line, is the time to
design and develop a new microprocessor, which is currently about 5 years.


All of Intel's facilities are fully automated. By that I mean, the robots have taken over.
Humans are no longer involved in silicon processing anymore like in the good old days (just 20 years ago).
In fact, the picture on the right reminds me of something out of Kubrick's movie
2001 A Space Odyssey.
A robot transports the wafers, each of which is 300 mm in diameter (about the size of dinner plate),
between processing steps via an overhead monorail track system.


Each processing step is usually carried out in a separate machine. A robot transfers the
wafer-carrier from one processing machine to the next machine and so on. Any of these processes can also have
their parametric settings altered on the fly by process engineers using computers outside the clean
room. That means there is no guarantee that any two wafers have received identical processing. In
fact, there is a numeric zoo for AMD and Intel microprocessors which try to identify microprocessor chips that
received different processing. Processing includes such things as ion implantation (doping),
photoresist application and removal, oxide deposition, and most importantly, metal deposition.
That's what produces the metal conductors
(wires) that interconnect the transistors and circuits. To accommodate all the complex crisscrossing of the
interconnect network, there are 8 metal layers using copper interconnect (M1,M2,...,M8) on the new
microprocessors. The bottom layer M1 connects transistors to transistors, while the upper layers
are used to connect circuits spanning the chip.





Table 2: Major technology nodes predicted by the SIA 2006 roadmap.



 Year   2004 
 2007 
 2010   2013   2016   2020 
 nm  90 65
  45  
32 22 14



Each progression in shrinking the technology is identified by a waypoint or "node" (Table 2)
on the SIA (Semiconductor Industry Association)
roadmap.
A technology node is defined primarily by the minimum metal pitch used on any
product
, for example, the Metal-1 layer (M1) half-pitch in a microprocessor. Here,
pitch
refers to the spacing between wires. You can see that IBM and Intel are well ahead
of the SIA roadmap; another reason the announcement in Section 1 was newsworthy.
This also explains those numbers in the Fab (nm) column in
Table 1. A lot of people quote those numbers, but very few know where they come from.
Now you know too.
Apparently, Freescale Semiconductor, Inc. has not yet
found a way to get below 90 nm reliably. Since they were "fabbing" the PowerPC G5-microprocessor for the Apple Macintosh, that delay
in moving from the 2004 SAI technology node to the 2007 technology node was
leaving the Mac behind in the CPU-speed sweepstakes. That's partly why Apple Inc. had to cross
over to Intel microprocessors.


Moore's law also captures the amazing scale of cost reduction. It explains why you can buy
what was a 10,000 MIPS supercomputer a decade ago and have it in your laptop today
for a few hundred dollars. From the performance perspective, however, Moore's law is more
often associated with increasing MIPS. To understand
how Moore's law has run into trouble at 45 nm, we need to understand something about how CMOS
transistors work. CMOS stands for Complimentary Metal-Oxide Semiconductor and
I'll come back to more details about that in Section 6.


How does a transistor work? Imagine your garden hose with the water turned on at a moderate
flow from the tap to the nozzle, which is lying in a drainage ditch.
The water flowing in the hose is analogous to a current of electrons
between a source (analogous to the tap) and a corresponding sink
(analogous to the drain). In fact, two of the terminals in a transistor are called the source
(labeled S in Figure 2) and the drain (labeled D in
Figure 2). Part of the role of a transistor is to control the current flow between
the source and drain. When using your garden hose, you normally control the flow of water at the tap (the source).
The CMOS transistor does not work that way. Instead, it is more like controlling the water flow by
pressing your foot on the hose to stop it (off) and lifting your foot to start it (on).
You are likely to be more effective if you are wearing shoes. The
transistor terminal that corresponds to your shoe is called the gate and it sits over the top
of the source and drain terminals, just like the situation with the hose. The three terminals together form a triode valve, which is the silicon version of the
glowing glass tubes you can see in the back of any TV set that is more ten years old.








Figure 2: Schematic comparison of the standard transistor gate and the new high-K metal gate
(shown in blue).





Depending on how the transistor is configured in a circuit, when you push you foot down the current
is turned off and that might correspond to a digital `0'. Similarly, when you lift your foot up
that might correspond to a digital `1'. This explains how Moore's law is associated with processor speed.
As the transistor geometry becomes smaller, the distance between source and drain gets shorter
and fewer electrons need to flow under the gate. A major hurdle that has to be overcome in scaling down
from 60 nm to 45 nm technology is leakage of electrons between the source and drain.
By analogy with the garden hose, trying to stop the water flow with a conventional shoe is no
longer effective at these tiny scales, so the difference between a digital `1' and `0' becomes less distinct,
and this can cause all sorts of subtle problems.
To really pinch the hose tightly, you need to wear metal-capped boots. And

very special metal at that,
called Hafnium.

There are many other special fabrication tricks-of-the-trade required to make the 45 nm fab
technology operational, but they need not concern us here.



5  Moore Suffers Heat Stroke


We fell off the Moore's law curve, not because photolithography collided with limitations due to
quantum physics or anything else exotic, but more mundanely because it ran into a largely
unanticipated
thermodynamic barrier. In other words, Moore's law was stopped dead in its tracks by
old-fashioned 19th century physics. As CMOS feature sizes are reduced and switching speed increased,
thermal power dissipation becomes a serious problem because it is directly proportional to the clock frequency
used on the chip. A typical 2-3 GHz CPU generates on the order of 100 Watts (same as a
household light bulb). And don't forget the huge power transients at the pins of the chip, which are
essentially scale-invariant. Moreover, you end up trying to push that heat through a pinhole: the
small die size. In other words it's really the dissipation of power density that kills you.
To the degree that this power density cannot be dissipated,
the chip failure rate increases dramatically due to thermal degradation. The Apple Mac G5 dual
processor (IBM Power 5 chip) has a freon cooling system (a la Cray) together with a
spectrophotometer that detects freon leaks. If such a leak is detected, the system shuts down
immediately. This is another reason Apple went Intel.
Some designers have proposed hetero-speed cores to keep thermal dissipation problems under control.
General workloads would run on moderate speed cores and the highest speed cores would only run
under specific compiler directives.
This is yet another rendition of Amdahl's law, as described in Section 2.

As I understand it, this thermal barrier is the reason why multicores have been so heavily promoted
by the CPU vendors; keep the cores running at about current clock speeds and compensate for the
absence of higher speed by adding more cores per module to provide more aggregate MIPS.
Of course, this looks good on paper, but it comes with it's own set of problems, which those of us
who have been involved with the development of SMPs are familiar with.
Quite apart from Moore's law having been thermally defeated, many
application programmers have become rather addicted to the thrill of riding Moore's
exponential curve. Even a purely single-threaded program will run faster without any additional
programming effort. But what happens when you have to run threads across multiple cores?
Welcome to the return of
concurrent
programming
as a major performance issue for applications in the foreseeable future. Been
there, done that, 15 years ago. But this time it's worse, because the cores are inside a true
black-box; the module. Without appropriate hardware registers, any serious performance tuning will
likely have to be accomplished in software alone.



6  M is for Metal (Not Moore)


In an ironic twist of fate, the new CMOS transistor technology actually hearkens back to the
earliest transistor implementations. When I was involved with VLSI design tools at Xerox PARC, we used the Mead-Conway design rule:
"poly over silicon" produces a transistor. The word poly refers to polysilicon or amorphous
silicon. The Mead-Conway
rule
was actually shorthand for a layer of poly-Si over an implied silicon dioxide insulator
over a doped, over a Si substrate with implied source and drain, leads to a transistor being
produced from the photolithographic masks generated by the VSLI CAD tool
.
Now, recall from Section 4 that the `M' in CMOS stands for metal. That terminology is
already a throwback to the days when transistor gates were made by depositing metal (usually
Aluminum) rather than poly-silicon over the silicon substrate. So it's somewhat ironic that, as part
of the joint announcement, Gordon Moore was trotted out to publicly proclaim:



"The implementation of high-k and metal gate materials marks the biggest
change in transistor technology since the introduction of
polysilicon gate MOS transistors in the late 1960s"


I should point out here that the Hafnium metal is used in the so-called high-k gate oxide layer
(the yellow strip in Figure 2), not
the metal gate itself. Neither Intel nor IBM disclosed what type of metals will be used in the gate.

The key issues at each technology node defined in Table 2 are usually the same:


  • Yield. How many chips actually operate to spec when the wafer comes out of the fab line?

  • Process variation. As the scale gets smaller, consistency of process control also decreases.

  • Thermal power dissipation. See Section 5.

Each of these issues comes into play with different levels of significance at each SIA node. In
fact, it turns out that Moore's law has died many little deaths (le petit Moore?) since he
first proposed it (but people have short memories). Indeed, Moore's remark could be interpreted as
an acknowledgment that the transition to 45 nm was closer to a near death experience than
anyone has witnessed before.



Take a look at the picture on the right. No, that's not a Google Earth image of a football field at
the edge of town, that's a photograph of the Penryn microprocessor chip. This means it's real! The
claims for the new Hafnium-oxide/metal-gate technology include:


  • Approximately 2x improvement in transistor density, for
    either smaller chip size or increased transistor count


  • Approximately 30% reduction in transistor-switching power

  • Greater than 20% improvement in transistor-switching speed

  • More than 5x reduction in source-drain leakage

  • Greater than 10x reduction in gate-oxide leakage

It remains to be seen whether these attributes really translate into the so-called Moore's law
II
curve and lead to faster, cooler single processors or just somewhat faster and somewhat cooler
multicores.



7  Conclusion

Finally, let's see if I can answer my own questions.


Why a joint announcement?

Announcing with IBM (the actual fab vendor) draws less of the wrong kind of attention than
announcing with AMD who is the direct competitor. Also, it seems that both IBM and Intel benefitted
directly from VLSI research performed several years earlier at
Sematech. A joint announcement tends
to dissipate any feud over who discovered what processing tricks first. Do I smell a lot of cigar
smoke and back room deals here?

How can Intel upgrade all their fab lines so fast?

This question has a rather startling answer. Since all of Intel's fab facilities are
fully automated and software controlled, the correct processing parameters and machine schedules are more or less
uploaded from their Hillsboro research fab. Think about that;
Intel will upload an entire factory!

Why put all your silicon eggs in the same fab basket?

I expect this is primarily a response to the
tremendous competitive pressure Intel finds itself under. Since they have little to lose, they're going for broke.
Let's hope there are no bugs in that fab software they upload from Hillsboro.

Are multicores dead?

Here's where I suspect Intel is not going to put all their eggs in the
same basket. They don't have to. They can keep the multicore option open in case the expected efficiencies
at 45 nm don't pan out as expected or Moore's law dies another little death.

Has Moore's law been resuscitated?

To answer this question, we'll just have to wait and see. But the wait is only about 6 months,
according to Intel.

"No exponential is forever, but we can delay 'forever'."—Gordon Moore