The New Era of Scaling

Today I’m hanging out at In-Stat’s annual Microprocessor Forum. Very kind of the organizers to let me have a press pass courtesy of this blog. I’m happy to take advantage of the nebulous definition of “media” these days!

The keynote was given by Intel’s Mark T. Bohr. I haven’t kept fully up to date on the GPU business (general purpose processors, e.g. stuff you’re probably running your browser on). For example, I didn’t know that Intel was opening a large fab in China; scheduled for 2010.

Intel has executed almost as poorly as possible over the past several years and still survive as a profitable company. The Itanium debacle has astonished me since it’s release; firstly, how impressive the initial technology announcement was (one of the first major technical presentations I watched live on web “tv”, in the mid-90s), and secondly, how quickly it was evident that the strategy was a mistake. Intel then proceeded like a drunk oil tanker that refused to come to terms with reality. Billions in shareholder cash was spent in vain because nobody would state the obvious. AMD, of course, capitalized on this and regained market share and breathing room.

But Intel seems back in form over the past year or so. The Core 2 Duo systems (in particular the MacBook Pro I’m writing this on) is incredible and not matched by AMD; and Intel is continuing to rediscover the fact that the x86 instruction set (IA) pays all the bills at Intel so perhaps shouldn’t be treated with total disregard. Many of the announcements from IDF in Beijing last month were reinforced in various Intel talks here today. But let’s first set the stage.

People frequently confuse Moore’s law with performance; Moore predicted number of transistors on a chip, not the speed increases. They have historically gone hand-in-hand: smaller transistors allow shorter switching time and the frequency went up. But as has long been anticipated, the power consumption (and related heat dissipation) go up super-linearly. For a long time the perceived solution was to switch to completely new processor architectures. This was a key driver of the Itanium product. In reality, the practicality of the x86 instruction set and the convention of a single (large) central processing unit has dominated.

So, incredibly, Moore’s Law is now looking at several more years of good health, approaching a half-century (Moore originally described his prediction in 1965, but only supposed it would be valid for 10 years).

65 nm has been in production since October 2005, about a year later was crossover (when more 65 nm cpus shipped than all other types).
The 45nm process will go into production in the second half of this year; e.g. that same two-year cycle, so expect these processors to be common in about one year from today. So that’s when I’ll upgrade my MacBook Pro to a quad system…

(Oh, and on the topic of how to web-2.0-ify web 2.0 conferences – see posting below – at MPF the presentation slides were handed out on USB sticks upon registration, and there were chat servers running for posting questions to the speaker. Very nice.)

(Small footnote – it seems that Intel won’t move from 12 to 18 inch wafers until after the first 32 nm products are shipping, more in the 2011-2012 time frame; the aggregate economics of highly multicore GPUs in 32 nm on 18 inch wafers around 2012 will be quite interesting. Who knows. Maybe Vista SP3 might be ready then.)

It’s not as if staying on this 2-year cycle is simple. The physics involved are daunting, not to mention the manufacturing challenges. This of course is where Intel’s strengths continue – the sheer scale needed to move GPUs forward to 45 nm and 32 nm (volume in 2010) and beyond. Intel has scheduled four fabs for the 45 nm process in 07 and 08.

You can dig through a lot of technical magical terms and formula to get a grip on the problems facing processor companies like Intel: source-drain leakage; low-k doped oxide; SiGe strained silicon; high-k dielectric; gate oxide leakage power; polysilicon gates; etc. If you want to seem real knowledgeable at your next poolside party, just respond to the local braggart with something like:

[GEEK] I just bought [insert expensive/oddball computer description here]

[YOU] Interesting. Well, I’m waiting a bit. Intel got a 45 nano meter SRAM and logic chip test working in early 2006 with high-k metal gate transistors replacing polysilicon as the gate electrode, and that will all be in the Penryn processor later this year. That’s the stuff they described at IWGI in Tokyo back in 2003. Really cool. It also adds the SSE4 instructions, like streaming load and a sum-of-absolute difference instruction. Great stuff for video processing. That’s my next upgrade.

[GEEK] [Kneels submissively] I am not worthy … I am not worthy …

Put very simply, smaller transistors allows higher switching speeds which requires higher voltage, but power consumption (and thus heat dissipation) goes as the cube of the voltage (actually more like the power of 3.2 since in CFV**2 F sort of moves with V and C does a little bit as well). Which basically sucks. For well over 10 years this had been debated in academia and the industry. In the end, technology powerhouses like Intel and IBM were able to push the problem into the future for a surprisingly long time. But around 3-4GHz, no further pushes worked in practice, so the industry-wide fallback position is to scale back on power and make the processors “multicore”, which basically means multiple processors on one chip. Today two, “duo”, is standard in desktops and laptops, and quads in servers.

The dilemma is that software has largely ignored parallelism for a long time. The reason for that is that parallel programming is hard, despite decades of research. For server workloads (web servers etc), it’s easy to leverage multiple cores. But not so for personal productivity stuff (with the significant exceptions including video and image processing, and games).

So what Intel is doing here is really two things. First, throwing as much obscure semiconductor physics at the problem as they can in order to mitigate as much as possible the power consumption problem. Second, trying to leverage as much parallelism as possible. Dual and quad cores are the more obvious architectural changes, but more subtly, the x86 instruction set is incrementally adding more vector operations. SSE4 (coming in the 45nm Penryn) includes stuff like dot product ability and a “super shuffle” engine. This is stuff that is directly usable by various flavors of video and image processing. I would expect this overall trend to continue: the growing transistor budget allows for a very long tail of specialized instructions that hide large, specialized compute engines on the chip.

Both the physics and the architecture extensions will play to Intel’s strengths. Only Intel can drive significant new instruction set extensions and expect wide adoption. AMD had a leg up with being first with 64-bit support, but Microsoft’s glacial tempo in rolling out OS support for it pretty much doomed 64-bit computing as a way for AMD to gain significant market share. And Intel, like good old IBM in the old days, had a skunk project they could dust of real quick to match the basic abilities.

Similar architectural changes will occur on grids of cores on a single die; again, something Intel stands to gain by. One research project Intel talked about today is their teraflops-on-a-chip effort. Peak of 2 teraflops with a single chip with a 3GHz 80-processor (IA) die. And 1 teraflops at 62 watts, which is amazing. Again, something that Pat announced in IDF last month: Intel plans to make a product on this, it seems.

But what’s really interesting is the Silverthorne – 45 nm Hi-k new low power microarchitcture (demo:d in April at IDF in Beijing last month). So finally Intel is coming out with a processor that is completely designed for mobile devices.

But what I found most cool was a little hidden in the presentations, and was sort of announced at IDF last month. Namely, that the first genuine ultramobile IA processor will go in production later this year – sooner than expected. Silverthorne is the first from-the-ground-up Pentium processor, paired with the Paulsbo chipset, together forming the Menlow UMP (Ultra Mobile Platform). Tony Smith has been covering the trickle of information on this platform.

This stuff is dramatically more lower power than the Celeron “mobile” technology. It’s in my view the first real entry by Intel into what otherwise would be called the “embedded” space with the IA architecture. It’s the first serious processor (in the sense of being able to fully run a desktop/workstation OS like Windows XP/Vista, Solaris, Linux, etc).

Intel has said that they will deliver this platform “before June 2008”, which means H1, but intriguingly, they have also said, and repeated today, that 45 nm is going into “production” later this year. Which would match their typical 2-year cycle. So, when exactly will Silverthorne be available? Perhaps as early as Q4 this year.

Regardless, it will be out in products within a year of now, and that’s huge. This will introduce a new era of portable devices. The efforts of Transmeta and OQO have shown the potential hear, though they are the “GO Computing equivalent” of this space – too early with their vision.

Personally, I’m looking forward to 2.0 of Apple TV, which I hope is a full-function Mac-in-a-box with Silverthorne inside and that I can connect to my HDTV.

[Edit: thanks to my good friend RB for correcting a previous, sloppy, formulation of the power formula]

3 Comments

  1. RD

    Great coverage. thanks.

    Like

Trackbacks

  1. Ethnicities
  2. Facade (Alias episode)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: