Nehalem Day For reasons that Sun Microsystems has yet to explain, late last week the company decided to pull its briefings on Galaxy servers based on Intel's new Xeon 5500 (née Nehalem EP), and focus instead on how Solaris 10 is being tuned for the new chip.
Sun did put out a statement about its server launch slated for April 14, and promised to show how the collaboration between Sun and Intel would result in some nifty systems.
"This is a real-world game-changer in the commodity x86 marketplace," explained John Fowler, general manager of Sun's Systems Group. "Sun's new Solaris advancements unleash the power of the Intel Xeon processor 5500 series, delivering enterprise-class engineering and innovation in this market. Sun's systems and Solaris optimization provide record-breaking performance, scalability and huge energy efficiency to help customers maximize their business results."
That's all you're gonna get for now, except for the sneak peek I already gave you from the SC08 supercomputing trade show in November 2008 of a double-wide two-socket Nehalem blade with on-board QDR InfiniBand ports. To whet your appetite for marketing glitz, Sun has also put out its own sneak-peek video on the upcoming launch, which you can watch here - if you like loud music and flashy graphics with little substance.
(A side note: Someone must like this stuff, because all IT vendors offer such video clips. Personally, when it comes to servers, I get excited by feeds, speeds, low prices. I suspect that that trio motivates real computer buyers as well, whether they're consumers or corporations. Show me those figures and cut out the noise - and save yourself some money and me some time. If I want to rock out, I will play air guitar to the first side of Boston's first album, or any of a number of Led Zep albums. Yeah, I know, I said "album." Sometimes I say "record," too - it makes my kids laugh. End of digression...)
Herb Hinstorff, director of data center software business management, walked me through some of the Solaris tunings and features for the Nehalem EPs, which is what Sun can talk about this week.
First up are Solaris 10 10/08, the update that came out last fall, and OpenSolaris 2008.11, the most-current spin of the open-source implementation of Solaris. Both support the extra SSE 4.2 instructions that Intel has put into the Nehalem chips to perform esoteric routines that make applications run faster - in this case string manipulations.
Real programmers understand why these SSE 4.2 instructions are important (for Nehalem, seven instructions were added to the set of 47 in the prior Penryn line); what I have read is that they can accelerate XML document processing, boost pattern-matching performance, and otherwise speed up networking and storage. Solaris libraries and the Studio compiler tools have been updated for these extra SSE 4.2 instructions.
Solaris is also getting features that allow it to peer into performance counters inside the Nehalem chip, which gives application developers a chance to see what's happening in the chip - instruction retries, cache misses, and so on - to help with debugging and performance tuning.
For many years, Xeon processors have had static resource affinity tables (SRATs), which hold all the topology information for processors, memory, and other components of the system. These are are necessary when building out NUMA-style servers that are collections of multiple motherboards glued together. Sun is taking this SRAT feature and a related one, called the system locality information table (SLIT), and using them to reduce that latency between processors and main memory in Nehalem systems - even if they are not, technically speaking, NUMA machines.
Given that the Nehalem EPs are only used in two-socket boxes, you might presume the two processor sockets are lashed together using regular old symmetric multiprocessing techniques. But perhaps they're more NUMA-like than many of us expect. The line between SMP and NUMA has been blurring for years, and hopefully Intel will clear up exactly how the Tylersburg 5520 chipset makes two processors share main memory.
Anyway, however it works, SRAT/SLIT support is already in the bi-weekly OpenSolaris builds and will be rolled into the next six-month release - which should be any day now - as well as into a Solaris 10 update - which, again, should appear soon.
Sun is also rolling out a feature in Solaris called the power aware dispatcher, or PAD, which knows about the various p-states, c-states, and deep c-states of the Nehalem EP processor, as well as the needs of the applications that are running. PAD optimizes these to maximize performance while at the same time reducing power.
The Nehalem processors have four cores and two hyperthreads each for execution. They also have different combinations of voltages and clock frequencies at which they can run (p-states), core idling states (c-states), and complete power-off states (deep c-states). Then there's the new Turbo Boost, which can shut down one, two, or three inactive cores and crank up the clock speed on the remaining active cores.
That's a lot of different threads and states to keep track of to optimize performance - and it's not a job a system administrator can do unless he can type at the speed of light. Hence the PAD, which Hinstorff says can help reduce idle power consumption by 20 per cent without affecting application performance just by shifting workloads around the cores and threads in the Nehalem system.
Sun has also worked with Intel to port its PowerTop tool to Solaris, which allows programmers and system tools to look inside the Nehalem chips and watch the Turbo Boost mode in real-time. The Sun implementation of PowerTop marries it to DTrace (dynamic tracing), a key monitoring tool launched with Solaris 10 four years ago.
The PowerTop implementation for Solaris can be downloaded from the OpenSolaris repository and works with both Solaris 10 and OpenSolaris. ®