Nehalem Day Intel's "Nehalem EP" Xeon 5500 series of processors for two-socket servers were announced this afternoon. Finally. Now, the server market can breathe a sigh of relief and set about the difficult task of trying to peddle better boxes in a worsening economy. Which sure beats trying to sell last year's machines this year.
Today, in Santa Clara, California, Intel general manager Pat Gelsinger characterized the launch of the Xeon 5500 chips and their related chipset, the "Tylersburg" 5520, as the most important server chip launch since the Pentium Pro was introduced back in 1995.
Back then, Gelsinger explained, the Pentium Pro was the first x86 chip that had native multiprocessing capability built into it. (Some niche server vendors back then had created their own chipsets to glue multiple Pentium chips together to make SMP servers, of course, but Gelsinger didn't mention that). It was also the first Intel chip to feature out-of-order execution, a nifty technology that had been experimented with on RISC and other processors to boost performance.
The Pentium Pro laid the groundwork for the standard, high-volume server platform, Gelsinger explained, which is absolutely true, and by the time it evolved into the Xeon family, the dot-com boom exploded and x86 rack servers started flying out of the factories of Intel's partners like a hotcakes. This will show you how much things have changed since then.
Back in 1995, when the Pentium Pro was launched, Intel's partners pushed maybe 700,000 servers a year, and those machines represented less than 10 per cent of the global revenue for servers. These days, the vast majority of the 8 million servers shipped per year are x64 machines, and these boxes account for a little more than half of all revenues.
It will take another dozen years before we can declare the Nehalem EP server launch to be as "transformational" as Gelsinger and his peers at Intel want us to believe it is, but the fact is that the Nehalem EP chip is exactly the high-volume, high-performance, energy-efficient chip that should have been launched years ago, not today.
Intel's customers have Advanced Micro Devices to thank for the long road from the appallingly-bad Paxville Xeon DPs back in late 2005 to the Nehalem EPs launched today. Had AMD not been aggressive with the Opteron design - including 64-bit memory extensions, multicore capabilities, point-to-point interconnection between processors, memory, and I/O, integrated memory controllers, and lots of other goodies - Intel would probably still be talking about 32-bit Xeons and the inevitable move to 64-bit Itaniums.
There are seventeen different Nehalem EP chips, including a dozen Xeon 5500 parts for two-socket servers, three Xeon 3500 variants that plug into uniprocessor servers, and two low-voltage parts that are intended to be used in embedded applications. A couple of the chips have only two cores, but most of them have four cores on the die. The chips have either 4 MB or 8 MB of L3 cache and have bandwidth on the QuickPath Interconnect interfaces that ranges from 4.8 GT/sec to 6.4 GT/sec.
The old Xeon architecture had a frontside bus that fed out to elements on the system board, which ran at a certain speed (800 MHz, 1.33 GHz, whatever), but Intel has to measure the billions of transfers per second that QPI can handle because it isn't just pointing in one direction, but in many directions.
Here's the Nehalem EP lineup, with their basic feeds and speeds (name, clock speed, core count, cache, wattage, bandwidth, and price each for 1,000-unit trays).
For two-socket servers and workstations:
- W5580: 3.2 GHz, quad-core, 8 MB L2 cache, 130 watts, 6.4 GT/sec; $1,600
- X5570: 2.93 GHz, quad-core, 8 MB L2 cache, 95 watts, 6.4 GT/sec; $1,386
- X5560: 2.8 GHz, quad-core, 8 MB L2 cache, 95 watts, 6.4 GT/sec; $1,172
- X5550: 2.66 GHz, quad-core, 8 MB L2 cache, 95 watts, 6.4 GT/sec; $958
- E5540: 2.53 GHz, quad-core, 8 MB L2 cache, 80 watts, 5.86 GT/sec; $744
- E5530: 2.4 GHz, quad-core, 8 MB L2 cache, 80 watts, 5.86 GT/sec; $530
- E5520: 2.26 GHz, quad-core, 8 MB L2 cache, 80 watts, 5.86 GT/sec; $373
- E5506: 2.13 GHz, quad-core, 4 MB L2 cache, 80 watts, 4.8 GT/sec; $266
- E5504: 2 GHz, quad-core, 4 MB L2 cache, 80 watts, 4.8 GT/sec; $224
- E5502: 1.86 GHz, dual-core, 4 MB L2 cache, 80 watts, 4.8 GT/sec; $188
- L5520: 2.26 GHz, quad-core, 8 MB L2 cache, 60 watts, 5.86 GT/sec; $530
- L5506: 2.13 GHz, quad-core, 4 MB L2 cache, 60 watts, 4.8 GT/sec; $423
For uniprocessor servers and workstations:
- W3570: 3.2 GHz, quad-core, 8 MB L2 cache, 130 watts, 6.4 GT/sec; $999
- W3540: 2.93 GHz, quad-core, 8 MB L2 cache, 130 watts, 4.8 GT/sec; $562
- W3520: 2.66 GHz, quad-core, 8 MB L2 cache, 130 watts, 4.8 GT/sec; $284
And for embedded servers - core count, cache and bandwidth not yet known - which have a seven-year lifecycle as required by makers of embedded systems:
- L5518: 2.13 GHz, 60 watts; $530
- L5508: 2 GHz, 38 watts; $423
The Nehalem EP processor has just over 730 million transistors and is manufactured on a 45 nanometer Hi-K process. The Penryn microarchitecture that debuted in earlier Xeon processors has been tweaked, according to Gelsinger, including deeper out-of-order execution, branch prediction, and a slew of enhancements to make server virtualization have less overhead than it currently does.
But the virtualization software has to be tweaked to take full advantage of these new VT-d and related features, he cautioned. With the new virtualization features coupled with VMware's future vSphere virtualization hypervisor, four server makers (Cisco, IBM, Dell, and Inspire) have demonstrated about a 160 per cent improvement in performance on the VMark virtualization benchmark from VMware compared to prior "Harpertown" Xeon 5400 servers.
"Our goal is to get native performance on virtual machines," explained Gelsinger, but he conceded that Intel may not be able to get the overhead all the way down to zero.
For plain old physical workloads, Gelsinger said that about 30 of Intel's ISV partners were seeing roughly a factor of two performance boost running applications, sometimes as much as three times, in the move from two-socket Harpertown boxes to the new Nehalem EPs. The Nehalem machines launch with over 100 applications optimized for the Nehalem architecture - the most Intel has ever had at a server launch, according to Gelsinger.
The Nehalem chips also include a feature called Turbo Boost, which allows the power to the cores to be completely shut down if the cores are not needed, which allows the clock speeds to be cranked up on the remaining cores, thereby boosting the performance on existing workloads and, hopefully, reducing overall power consumption.
Intel has not yet said which of the Nehalem chips support Turbo Boost, but it does not appear to be all of them. (As we report elsewhere, Hewlett-Packard is only support seven Nehalem chips with the Turbo Boost mode, and only in one of the eleven servers it announced today, the DL380 G6).
Don't get too excited about Turbo Boost, though. You can't overclock a core to 4 GHz or 5 GHz if you shut one, two, or three cores on a quad-core Nehalem EP down. Intel says vaguely that on servers, depending on the chip in the box, Turbo Boost can push the clock speed up to 3.33 GHz. On workstations using the uniprocessor versions of the chips, the clocks can be cranked as high as 3.46 GHz using Turbo Boost.
You can get the full Intel price list here (PDF) if you want to compare the new Nehalems to the existing "Harpertown" Xeon 5400s.
It comes as no surprise that Gelsinger declared, as it has many times in the past, that the era of proprietary and RISC/Unix computing is over. And he trotted out 30 benchmark tests that Intel and its partners will win in the two-socket server space with the Nehalem machines.
Comparing Nehalem EP systems to Sun's UltraSparc T2 systems, Gelsinger said that on a suite of four common benchmarks measuring different parts of the system, a Nehalem box was half the cost and delivered 1.71 times the performance. And compared to a Power6-based Power 570 server from IBM, a Nehalem EP machine was one-tenth the cost and delivered 2.45 times the oomph. "Comparing to the IBM Power environment, it is almost humorous."
Well, maybe so. But I am sure IBM doesn't think so. It would have been enlightening to see a comparison with HP's two-socket Integrity servers using Intel's Itanium 9100 series processors. But you can't expect that kind of talk from Intel. But you can expect that El Reg will be gathering up the data to check these claims by Gelsinger and to make some of the comparisons he might not be so comfortable making. Stay tuned. ®