ISSCC Intel was not about to pre-announce all the feeds and speeds of its future Xeon and Itanium processors at the IEEE's International Solid-State Circuits Conference in San Francisco this week. But its chip engineers are just like all the others attending the event. They want to show off the electrical engineering marvels they have created, and they did lift a little curtain on future "Sandy Bridge-EP" and "Westmere-EX" Xeon processors, due later this year.
As El Reg previously reported, the chip giant also closed out the enterprise processor sessions at ISSCC with details on the forthcoming "Poulson" Itanium chip, which sports eight completely redesigned cores. So Itanium would not get lost in the shuffle and would get top billing as the press rolled into ISSCC on Sunday, Intel did prebriefings on the Poulson chip last week so the stories would come out ahead of the paper presented by engineer Reid Riedlinger.
The eight-core Poulson chip is the kicker to the current quad-core "Tukwila" Itanium 9300 processor. It was designed by a team of engineers spread across its Fort Collins, Colorado, and Hudson, Massachusetts, facilities, which are obviously historically connected to the chip design teams affiliated with the PA-RISC chips from HP and the Alpha chips from Digital Equipment. Intel borged those teams many years ago in the wake of HP and Compaq/Digital adopting the Itanium chip. The Westmere-EX processor, aimed at high-end servers, was designed at Intel's Bangalore, India, facility, while the Sandy Bridge-EN chip, aimed at entry and midrange servers, was designed by engineers in Santa Clara, California.
Shankar Sawant, an engineer on the Westmere-EX team, presented a paper that generically discussed this Xeon chip, which is socket-compatible with the current "Nehalem-EX" Xeon 6500 and 7500 processors that were announced in March 2010. The Xeon 7500 chip pretty much replaced the Itanium in the affections and server lineups of everyone but HP, which has to continue to use the Itanium to support HP-UX, OpenVMS, and NonStop workloads. The Xeon 6500 are low-powered versions of the Nehalem-EX aimed at HPC clusters with only two processor sockets; the Xeon 7500s and the "Boxboro" chipset shared by Xeon 6500/7500 and Itanium 9300 chips are designed for systems with two, four, or eight sockets.
The Westmere-EX is a tick on the Intel tick-tock rhythm, which means it is a processor shrink on the prior chip design. The Xeon 7500 was implemented using Intel's 45 nanometer processes and had eight cores, each with two virtual threads (what Intel calls HyperThreading) that make it look like sixteen cores to operating systems and hypervisors. The Westmere-EX will be a shrink to 32 nanometers, allowing for more components to be crammed onto the slice of silicon dioxide and for the chip to run faster or cooler at the same clock speed - whatever Intel and its customers think is best. What we know for sure is that Westmere-EX is a ten-core chip with twenty threads.
But if you look at the block diagram for the Westmere-EX, one thing becomes immediately obvious. Take a look:
The cores are numbered and are on the outside, with the "last level cache," which in the Xeon 7600 processors, as the Westmere-EX chips will almost certainly be called, all crammed into the center of the chip. In the dead center of the chip is the QuickPath Interconnect router, and the "uncore" strip that includes this router (the six black squares in the center of the die) splits the chip in two horizontally.
Twelve cores. Two amputated
There are six cores and matching L3 caches on top of the uncore, and four cores and their caches below. Yup, you got it. The Westmere-EX is actually a twelve-core design that had two cores cut off. (Intel didn't say that at ISSCC, but I am saying it and it is obviously true.) So when Intel does a process shrink to, say, 28 nanometers it will be able to tack on those two cores, crank the clocks, and put out a Xeon 7700 in fairly short order if it needs to.
Sawant said that the Westmere-EX chip had two integrated DDR3 memory controllers, which are at the bottom of the chip (labeled SMI I/O, short for Scalable Memory Interface). These controllers will support up to eight DDR3 memory channels. The chip has four QPI links along the top, and they will run at the current top-end 6.4 GT/sec speed that the Boxboro chipset supports. Intel probably won't crank QPI up to 9.6 GT/sec until another chip and chipset redesign comes down the pike. The Core C6 power states that came out with earlier "Westmere-EP" Xeon 5600 processors last year are being pulled into the Westmere-EX design.
The Westmere-EX chip will use a bi-directional ring to link all of the L3 caches to all of the cores on the chip. The routers to control this ring interconnect are to the left and right of the QPI routers at the center of the chip and have a dozen ring stops where the two rings, etched in metal layers 7 and 8 on the chip. (The Westmere-EX is implemented using Intel's nine-layer 32 nanometer wafer-baking process, which adds in strained silicon.) Here's what the rings look like, conceptually:
The rings use a 32-byte wide data path, which is half the width of a cache line, and more than 1,200 wires in layers 7 and 8 of the metal comprise this ring interconnect. With each tick of the CPU clocks, data can move one stop on the ring either clockwise or counter-clockwise.
Sawant says that the Westmere-EX chip has 1,567 pins, with 717 of the pins being dedicated to signal I/O. Here's what the Westmere-EX package looks like from the outside:
The package measures 49.1 millimeters by 56.4 millimeters and uses a 14-layer organic substrate. The heat spreader on the top of the package measures 35.5 by 43.1 millimeters.
Mystery 32nm silicon
Shenggao Li, a chip engineer from the Santa Clara team, gave a presentation on clock generation for a "32nm server processor with scalable cores" that no doubt is the future Sandy Bridge-EP Xeon processor for two-socket servers.
Here's what the chip actually looks like:
And here is what the block diagram for the chip looks like:
Like the future Westmere-EX Xeon and Poulson Itanium, this future Sandy Bridge-EP processor has a "core-out" design that puts the shared L3 caches in the center. Like the Westmere-EX, the Sandy Bridge-EP chip has a dozen ring stops on this internal cache interconnect, even though it only has eight processor cores. You can see the extra two L3 cache segments at the top and bottom of the central cache, and each of those segments have two ring stops instead of one.
This implies that the Sandy Bridge-EP is actually designed to scale to a dozen cores, which Intel will be able to do by stretching the caches out and adding cores to the ring. The scissors on the right side of the block diagram near the word "Scalability" implies this, but Li never said Intel intended to ramp the core count on this Xeon to a dozen cores. But clearly the chip maker can do so if Advanced Micro Devices starts winning the Core Wars among x64 server buyers.
The Sandy Bridge-EP chips have QPI buses in top of the chip. Intel has not said how many, but the same two as the prior Xeon 5500 and 5600 processors seems almost certain, and very likely running at the same peak 6.4 GT/sec speed. The chip has an integrated PCI-Express peripheral controller on top and two DDR3 main memory controllers at the bottom.
Intel did not provide a shot of the Sandy Bridge-EP chip package, but Li said that the die will measure around 20 by 20 millimeters and will be packed with 2.2 billion transistors. ®