A year ago, at the annual Hot Chips conference for chip designers in Silicon Valley, a company called Tilera came out of stealth mode and launched its 64-core Tile64 mesh processor. The Tile64 chip takes multi-core to an extreme, and an on-chip iMesh network allows a grid of cores and memory controllers to compete with X64 or DSP processors doing a variety of work.
This week, Tilera is putting its second-generation chips into the field and is getting some traction among various IT suppliers, who want to put the Tile64 processors and their homegrown Linux environment to work.
The Tile64 chip announced last year and the TilePro64 and TilePro36 kickers announced this week are not based on any existing processor cores and their associated instruction sets. The chips embody a new core that was designed from the ground up to take advantage of mesh networking on each core. This creates a large pool of compute resources that can be dedicated to running a single instance of Linux and its applications or carved up on the fly into virtual Linux images, each isolated from other virtualized slices.
Before getting into the changes in the new TilePro chip, let's review the first-generation device. The Tile64 core was a 32-bit design (with a 16-bit mode) that employs RISC and VLIW concepts. It can do three instructions per clock cycle, and the chip's speed ranges from 600 MHz to 1 GHz. The Tile64 chip has 64 KB of L2 cache as well as L1 data and instruction caches that are 8 KB in size each.
The switch that is at the heart of the Tile64 processor actually implements five different mesh networks - one each for memory access, streaming packet transfers, user data network, cache misses, and interprocess communications. Wrapped around the cores are four DDR2 main memory controllers, two Gigabit Ethernet ports, two PCI Express controllers, two 10 Gb/sec XAUI interfaces, and two flexible I/O interfaces to support peripherals such as compact flash memory or disk drives.
The whole shebang is implemented in a 90 nanometer process and made by Taiwan Semiconductor Manufacturing.
The Tile64 design is clever in a number of ways, which means it might see some use in IT devices near you someday soon. First, it does not use a bus architecture to talk to peripherals or to have processors and cache memory talk to each other. The iMesh network allows point-to-point communication between the chips and does away with bus architectures, which require high clock speeds and lots of energy to deliver bandwidth and scale.
The Tile64 chip also uses the mesh network so L2 caches on each core can be used like a giant L3 cache in a traditional design. Basically, any core can look into the L2 cache of any other core on the chip and treats that like a giant 5 MB L3 cache. While each core on the Tile64 chip can run its own complete instance of Linux, the cache coherency engendered in the mesh network means that a collection of cores can be setup to run an SMP setup of Linux, too.
The iMesh network controls all communication into and out of a core, a microcode feature called Multicore Hardwall Technology can partition a Tile64 into multiple virtual machines, allowing different instances of Linux and their applications to run on the chip and be isolated from each other. The Tile64 chip supports a variant of the Linux 2.6 kernel and has a tweaked version of the open source GNU C compiler and the open-source Eclipse integrated development environment.
Hash for Home
With the TilePro kickers, Tilera is making some performance tweaks in the design as well as delivering a cut-down 36-core variant of the chip. Rather than move to a new chip process and cranking the clock, the new TilePro chips are made in the same 90 nanometer process. The TilePro chips add another dedicated mesh network, this one for cache coherency management, which boosted performance, and so did doubling L1 data and instruction caches per core to 16 KB.
The chips also implement some electronics called "hash for home," which spreads data over the caches on the chip, eliminating hot spots where cores keep hammering the same caches. The new chips also have instructions added specifically for handling video and audio data (important for streaming appliances that will be using the chip) and other instructions for moving and copying data in memory.
The memory controllers on the TilePro chips also have memory striping - akin to RAID striping on disks - to reduce bottlenecks and a direct memory access feature to put data into cache memory without having to go through main memory. All of these and a number of features on the chips have boosted power consumption by 5 percent, but the performance per watt of the chips is nearly double.
That's another way of saying performance is nearly double, and on real workloads, it's somewhere between a factor of 1.5 to 2.5 better than the first Tile64 chips. Significantly for Tilera's marketing efforts, the new TilePro64 running at 866 MHz has 35 times the performance per watt of a 3 GHz quad-core Xeon processor from Intel and 15 times the performance of Texas Instruments' DaVinci DSPs.
The new TilePro64 chip has 64 cores and has 5.6 MB of distributed L2/L3 cache memory. It comes in 700 MHz and 866 MHz versions, and burns 19 to 23 watts when running real workloads. The TilePro36 is a cut-down version of the chip that runs at a much slower 500 MHz and has 3.2 MB of cache. It consumes 10 to 16 watts. The TilePro64 will begin sampling next month, and the TilePro36 will sample by the end of the year. First silicon of these chips was ready to play with in August. Tilera is working on a 120-core chip, due in late 2008 or early 2009, but said nothing more about it this week.
Supercomputing: Not an Option
What it the Tile64 designs do not have are floating point math units and Fortran compilers. So forget supercomputing. But that doesn't mean the Tile64 chips won't see use in the data center. Right now, Tilera has 45 customers, who are messing around with the chips to see what they can do and how they might use them. While the names have to be kept secret, Bob Doud, director of marketing at the company, says that the company has sold over 100 system boards with the chips, which comprise hundreds of processors, and that the company is generating millions of dollars in sales as it makes it way to a ramp during the first half of 2009.
One prototype machine being built with the Tile64 chips is a 5U server with a dozen chips that does SQL database acceleration, and another supercomputer maker is playing around with the chip just in case there are workloads where it can be useful. A number of financial services companies are also looking at the chips to run their algorithms, which do not need floating point math. Media streaming is also another area where companies are playing with the Tile64 chips, too, and so is intrusion detection and deep packet inspection on network devices. 3Com is an early user of the chips.
Tilera was founded in Santa Clara, California, in October 2004. The company's research and development is done in its Westborough, Massachusetts lab, which makes sense given that the Tile64 processor that is based on an MIT project called Raw. The Raw project was funded by the U.S. National Science Foundation and the Defense Advanced Research Projects Agency, the research arm of the U.S. Department of Defense, back in 1996, and it delivered a 16-core processor connected by a mesh of on-core switches in 2002.
One of the key components of that Raw project was the compiler technology that could harness the multi-core architecture of the processor and the integrated switches that linked them together. Anant Agarwal - who worked on the first MIPS RISC processor at Stanford University in the 1980s and who had created a 32-node mesh-based cache coherent processor at MIT in 1994 - had tackled many of these problems. The team that created the Tile64 processor includes techies who worked on Sun Microsystems' Sparcle and Digital Equipment's Alpha RISC processors, too, as well as networking systems from Cisco Systems and supercomputers from Hewlett-Packard and the long-since defunct Thinking Machines - also an MIT spinout. ®