ISSCC If the Chinese government is scaring the world with its hybrid CPU-GPU clusters, what do you think the reaction will be when Chinese supercomputers shun American-made x64 processors and GPU co-processors and start using their own energy-efficient, MIPS-derived, x86-emulating Godson line of 64-bit processors?
Apoplexy? Disbelief? A polite bow of respect? A bunch of orders for Godson chips is more likely, once you see what China is up to.
One of the more interesting presentations at this week's International Solid-State Circuits Conference, hosted by the IEEE in San Francisco, was by Weiwu Hu, the lead designer of the Godson family of processors being created by Institute of Computing Technology at the Chinese Academy of Sciences.
China started developing its own processor since 2002, explained Hu, and the Godson family of chips, which is based on the MIPS architecture created by Silicon Graphics, is part of a holistic technology investment program. The Godson chip effort is one of 16 different projects, in fact, that are each funded with between $5bn and $10bn.
The massive projects focus on specific technology areas that China reckons are key for its technological independence and economic future, including processors and operating systems, chip process technology, 4G wireless networks, nuclear fission power plants, water pollution control and treatment, aircraft design and construction, high-resolution satellite imaging, and manned spaceflight and lunar exploration.
As El Reg reported a year ago when China's ICT was bragging about its plans to build a petaflops-scale supercomputer with server maker Dawning, ICT originally got access to MIPS technology through its partnership with wafer-baker STMicroelectronics. But in June 2009, as it got serious about its Godson chips (also known by the name Loongson) it licensed the MIPS32 and MIPS64 architectures straight from MIPS Technologies, the chip-designing division of Silicon Graphics that was spun out in an initial public offering in 1998.
The initial Godson-1 processors were 32-bit chips running at a mere 266 MHz, and the Godson-2 moved to 64-bits and was revved up to 1.2 GHz. With the Godson-2F chip in 2007 and 2008, ICT came out with a design that has a four-issue core running at 800 MHz, rated at 3.2 gigaflops. The Godson-3A chip was delayed nearly a year and was aimed solely at servers. ICT shifted a four-core design and also did something else very clever: it added x64 instruction emulation right into the hardware. Hu only alluded to this emulation capability, but as El Reg explained a year ago, the Godson-3 chips have instructions added to help the QEMU hypervisor (the one that's at the heart of Red Hat's KVM hypervisor) to translate instructions from x86 to MIPS format. According to early benchmarks, the emulation penalty is about 30 per cent.
The Godson-3A chip was implemented in a 65 nanometer process and ran at 1 GHz to deliver 16 gigaflops of floating point oomph. The chip has 425 million transistors, an area of 174.5 square millimeters, and burned only 10 watts under load. The chip included two 16-bit HyperTransport ports (licensed from Advanced Micro Devices), 4 MB of L2 cache, and two on-chip memory controllers that support either DDR2 or DDR3 main memory.
With the Godson-3B, which is what Hu was there to talk about in San Francisco, ICT is sticking with the same 65 nanometer CMOS process and running the chip at the same 1 GHz. But the chip is bumped up to eight cores from four and has two 256-bit vector co-processors per core. The chip has two HyperTransport ports and two DDR3 memory controllers, and weighs in at 583 million transistors in a 300 square millimeter area. Running at 1 GHz, peak performance on those vector units is 128 gigaflops, with the chip only emitting 40 watts. According to early tests, the cores burn about 28.9 watts, while the uncore parts of the chip (HT, memory controllers, and crossbar switches for linking chips together) consume 11.1 watts.
According to Hu, the vector extension unit in the Godson-3B and Godson-2H processors have 128-entry, 256-bit register files and have more than 300 SIMD instructions that have been added to the MIPS architecture.
Here's what the Godson-3B chip looks like:
The Godson-3B processor will be used in the Dawning 6000 petaflops supercomputer, which China will be tweaking in 2012. Here's an early version of the blade equipped for the Godson-3B chips:
Dawning's two-socket Godson-3A and Godson-3B blade server
And this is what the blade server chassis looks like for the Dawning 6000:
The Dawning 6000 supercomputer blade server chassis
The Dawning 6000 blade design is used by the National Supercomputing Center in Shenzhen for its hybrid Xeon 5650-Nvidia M2050 system, which ranked number three on the Top 500 list from November 2010. That machine had an aggregate 1.27 petaflops of sustained performance running the Linpack Fortran benchmark test.
Another Dawning 6000 blade cluster with 3,000 of the Godson-3B chips, and rated at around 300 sustained teraflops, is expected to be up and running this summer, Hu said. (That would be about 384 peak theoretical teraflops just counting the vector units, not the cores.)
Those Dawning 6000 blades are by no means the highest density that ICT can come up with. Check out this system board for a 1U rack server that Hu showed off at ISSCC this week:
This IU2T system board packs 16 of the eight-core Godson-3B processors onto a single board, rated at 2 teraflops. So a rack of these puppies would yield 42 teraflops. So instead of hundreds of cabinets to reach 1 petaflops of raw number-crunching performance, as it can take with big x64-based machines, ICT could, in theory, do it with 24 racks.
ICT is not going to stop here. The Godson-3C design will shift to a 28 nanometer process and will come in eight-core variants like the Godson-3B as well as a 16-core variant. The Godson-3C will have faster clock speeds, too, running at between 1.5 GHz and 2 GHz. The roadmap says the chip is also capable of expanding up to 16 cores, too. ICT says the Godson-3C will deliver 512 gigaflops of raw performance on math work, and the way the math works, that is twice as much math moving from 1 GHz to 2 GHz and then a doubling again as the core count goes from 8 to 16. This chip is expected sometime around late 2012 or early 2013.
Wouldn't it be funny if Silicon Graphics started building systems with these Godson-3 chips? They could dust off Irix and take it out for a spin on some new iron and allow it to run x64-based Linux applications in emulation mode. ®