Today, x64 chip maker Advanced Micro Devices will launch its "Shanghai" quad-core Opteron processors for servers and workstations, concurrent with its annual financial analyst day meeting and ahead of schedule by AMD's reckoning.
The new chip - which offers more computing power, less heat dissipation, and better bang for the buck than its "Barcelona" replacement - closes some of the gaps that rival Intel has opened up with its Xeon x64 chips. And it gives AMD a chance to get its business back on track and get its server partners somewhat enthusiastic about AMD after a number of them got burned by the chip maker's delay in getting its initial quad-core chips out the door last year.
That gave Intel an opening to chew into market share gains that AMD made with single-core and dual-core Opterons since the product was launched in April 2003. And with a revamped, Core-based product line, the Xeons certainly have done that, despite some technical reasons why the Opteron design is better.
The Opterons were launched to much fanfare as the first 64-bit x86-compatible processor, and one with a funky interconnect called HyperTransport that looks an awful lot like the QuickPath Interconnect that Intel hopes to start rolling out in servers early next year and which makes its debut in the Core i7 processors any day now. While the tier one server vendors were a little cold to the Opterons at first, IBM played around a bit with one machine, and then Sun Microsystems said it would partner with AMD to create a new "Galaxy" server line, followed by Hewlett-Packard, which created Opteron variants in its ProLiant rack and BladeSystem blade server lines.
Dell eventually got into the act as well, and supercomputer maker Cray based a big part of its future on the Opteron too. When Intel got the jump on AMD with quad-core chips, that left Sun and Cray in the lurch and it left AMD to fend for itself with dual-core chips. A bug discovered in the Barcelona Opterons in late 2007, only a few months after AMD's first quad-core chips came out, pushed Barcelona shipments into the first quarter of this year. After that, heads rolled at AMD, the company rejiggered its engineering and quality control processes, and, according to John Fruehe, director of AMD's server and workstation division, did a very fast ramp.
"This is the fastest we have ever done from intial wafer to production parts," Fruehe says. Initial wafers were in early 2008, about the time Barcelona's bugs were worked out and production on that chip was ramping on a 65 nanometer process. AMD had not expected to be able to get Shanghai kickers out of its fabs until the very end of the fourth quarter of 2008 and into production systems, but depending on how you do the math, Fruehe says the chip is anywhere from 8 to 10 weeks ahead of schedule. (The first Shanghai parts hit the street in mid-October, apparently).
And that is all the more remarkable given that the chip is made in a new 45 nanometer water-immersion lithography technique that AMD is using for the first time. Fruehe says that AMD did not have to use immersion lithography for 45 nanometer chips, but chose to because it wants to gain experience with the technique well ahead of the 32 nanometer generation of processors, which do require it. Of course, this will mostly be a transition problem that AMD's soon-to-be-spunout foundry will have to cope with. Going forward, AMD will be a chip designer and marketer, and the Foundry Company, its fab spinout, will do the chip making.
Against this backdrop of wrenching change at AMD and an increasingly recessional IT spending environment enters the Shanghai Opterons, technically known as the 45 nm Quad-Core AMD Opteron Processors. Like I said, the Shanghai Opterons have four cores on a single die, and the chips plug into the same 1207-pin Socket F socket as the Barcelona and dual-core "Santa Rosa" Opterons. Each Shanghai core has 512 KB of L2 cache memory, and there is a shared 6 MB L3 cache on the chip, which is three times that offered on the Barcelonas. That extra L3 cache will be useful for supporting virtualized server environments, and so will the faster 800 MHz DDR2 main memory that the Shanghai chips support.
The good yields with the 45 nanometer processes used to make the Shanghai chips have also, according to Fruehe, allowed AMD to come out with 2.7 GHz standard parts (which have a 75 watt rating), and that is 200 MHz faster than the Barcelona Special Edition (SE) variants, which run at 2.4 GHz and 2.5 GHz and which burn 105 watts. This is a pretty big swing in performance per watt. And for standard parts, it means Shanghai has 400 MHz more oomph at the same 75 watts. Take your pick.
Because the Shanghai chips have the same pinout and thermals as the Barcelonas, server makers are going to be able to snap them into their products pretty easily.
Shanghai gets its tweak on
In addition to the larger cache and slightly higher clock speeds, the Shanghai Opterons have a number of tweaks. The chips support HyperTransport 3.0 links, although this capability will not be delivered until early 2009 when AMD rolls out its own Opteron chipsets. Memory management features in the AMD-V virtualization extensions to the X64 architecture have been rejiggered to more efficiently support virtual machines as they consume memory. The chips also have compatibility features that will allow VMs to bounce across generations of AMD chips, a feature set the company calls extended migration. (Intel offers a similar feature, called flex migration).
The Shanghai chips also include a feature called smart fetch, which moves data out of L1 and L2 memories on a core that is not being used out into L3 cache and then idles the core. The net effect of this feature on a normal workload is that a Shanghai standard part can consume as little power as a Barcelona Highly Efficient (HE, for a low-voltage part) chip, which is rated at 55 watts.
As for performance, AMD is telling customers that clock for clock, a Shanghai will show about a 20 per cent performance improvement over a Barcelona chip, and that for standard chips, comparing the fastest parts, the Shanghai will deliver a 35 per cent performance improvement. As for power consumption, a Shanghai chip under load will consume about 10 per cent less juice, but chips at idle can consume as much as 35 per cent less juice than a Barcelona. This is important, since most x64 chips do not work full-out even most of the time.
In the Opteron 2000 series, which is used in two-socket servers, there are five Shanghai chips. The Opteron 2376 runs at 2.3 GHz and costs $377. The 2378 runs at 2.4 GHz and costs $523. The 2380 runs at 2.5 GHz and costs $698. The 2382 runs at 2.6 GHz and costs $873. And the 2384 runs at 2.7 GHz and costs $989. The three lowest speed parts coincide roughly with the top three Barcelona parts in terms of pricing, and they run 200 MHz slower. (These prices are for single chips bought in a volume of 1,000-unit trays, as usual).
In the Opteron 8000 series, which are used in four-socket and larger servers, there are four Shanghai chips. The Opteron 8378 runs at 2.4 GHz and costs $1,165. The 8380 runs at 2.5 GHz and costs $1,514. The 8382 runs at 2.6 GHz and costs $1,865. And the 8384 runs at 2.7 GHz and costs $2,149. The bottom two Shanghai chips have the same prices as the two top-end Barcelona parts and offer that extra 200 MHz as well.
Clock for clock, the Shanghai Opterons offer about a 40 per cent price break compared to the Barcelona chips, but Fruehe says most customers who buy servers tend to stay within a particular chip and server price band and they take the extra performance instead of downshifting their spending. That, of course, may change, given the shakiness of the global economy.
Incidentally, AMD is still selling Barcelona chips, and does so because some vendors have qualified the processor for particular products and some customers have qualified them for particular applications. But given the relatively short life-span of Barcelona - server makers started shipping products in volume only in April - and the advantages of Shanghai chips, AMD does not expect Barcelona chips to be in the product catalog for more than a year or so.
In the first quarter of 2009, AMD expects to get low-voltage Shanghai HE parts into the field as well as faster (and hotter) Shanghai SE parts for companies that just want the most performance they can get, pricing and heat be damned. Fruehe would not say if AMD would be able to break 3 GHz on the Shanghai SE parts. But it seems likely that AMD will try to break that barrier. ®