Teradata wants to keep its dominant position in data warehousing and analytics, so it is picking up the technology pace to take on Oracle with its Exadata appliances and IBM with its Netezza and Smart Analytic System appliances. The company is rolling out an improved Enterprise Data Warehouse product line and a separate product that mixes disk drives and flash memory to shoot the gap between its disk-based and flash-based appliances.
As El Reg previously reported, Teradata rolled out its first fully flash-based data warehouse box, the Extreme Performance Appliance 4600, last October. This machine pairs up solid state disk drives and two-socket Xeon X5670 server nodes from Dell to create a data warehousing box that delivers around 18X the improvement in decision support query rates compared to the basic Active Enterprise Data Warehouse cluster running the Teradata 13.10 database. Average query times on the fully flashed EPA 4600 cluster are about one-quarter of the disk-based Active EDW appliance.
But not everybody needs that kind of performance, or can live with the 17 TB of data space upper limit that the 24-node EPA 4600 has. So Teradata is creating a new box that sits somewhere between the Active EDW and the EPA 4600 in its product line, offering a balanced mix of 3.5-inch 15K RPM Fibre Channel disks and SSDs.
To build this hybrid box, Teradata first went back to the drawing board and rejiggered the all-disk Active EDW design with the 6650 appliance, an upgrade to last year's Active EDW 5650. Scott Gnau, chief development officer at Teradata, tells El Reg that the 6650 appliance consumes about 25 per cent less energy and has about a 25 per cent lower footprint than the 5650 appliance. To accomplish this, 3.5-inch Fibre Channel disk storage used for the server nodes is packed more tightly in the server nodes.
The Active EDW comes in two flavors. The 6650C appliance only has one of the two processor sockets in the server nodes populated with a six-core 2.93GHz Xeon X5670 processor and 48GB of main memory on the nodes. Each node can have up to four disk drives and uses 8Gb/sec Fibre Channel adapters to link out to additional storage. The Active EDW 6650C uses Teradata's BYNET V4 system interconnect to link cluster nodes together and scales up to 4,096 nodes; it is designed to co-exist with prior EDW boxes and offers up to 22.8TB per node of user data space with 42 to 124 disks per node.
The new Active EDW 6650H plugs in a second processor for a total of a dozen cores per node, doubles up main memory to 96GB on the server node, and allows for between 84 and 232 disks to be attached per node for up to 29.6GB of user space per node. This box also scales up to 4,096 nodes across that BYNET 4 network to scale the data warehousing workloads, for a total capacity of 92PB.
Teradata's Active EDW 6650 and 6680
The Active EDW 6680 gets the flash drives and there is a significant performance boost on the I/O front and even denser packaging. This box is based on a two-socket Xeon X5670 configuration (that's the six-core chips running at 2.93GHz) with 96GB of main memory. Teradata is using 300GB SSDs from Pliant, with a total of a dozen SSDs per tray and three trays per cabinet. Gnau says that Teradata has tuned up four different SSD-disk node configurations, which vary depending on how much of the data customers are using is hot or cold, with the percentage ranging from a low of 15 per cent to a high of 34 per cent of total customer data space. Teradata is using disk and SSD enclosures from LSI this time around. Using the BYNET 4 interconnect, the 6680 can scale up to 4,096 nodes; depending on the disk and SSD configuration, each node can have between 3.8TB and 7.3TB of combined disk capacity, which works out to between 15.6PB and 29.9PB of customer data space max.
In general, the 80-20 rule applies, says Gnau, with about 20 per cent of the data using up about 80 per cent of the aggregate I/O. "You are always looking at last week's data, which is hot, and then comparing it to the year-ago data, which was cold, then gets hot for a while, and then gets cold again."
Because moving this data manually from disk to SSD and back again would be a nightmare, Teradata has cooked up a little something it calls Virtual Storage, which tracks the hotness and coldness of the data at a block level and automatically moves it back and forth between the media based on the demand from the queries smacking against it. This software was first released in the Teradata clustered database two years ago, according to Gnau. "This is not some vaporware," Gnau says emphatically. "It is out there in the field and it works."
Rack for rack, the Active EDW 6680 flash-enhanced appliances have about four times the throughput of the new Active EDW 6650 appliances; the 6680s have less capacity and user space, of course, but significantly higher I/O for a portion of their data.
Gnau says that a few customers have already installed the Active EDW 6680 flashy appliances, and that is is aimed at the belly of the market where flash will help on some queries – not at the Extreme Performance Appliance 4600, where the "all the customer's data is hot, all the time". Teradata is also using the 6680 appliances on the systems that track its quality metrics and manage its supply chain for its manufacturing operations.
While you can mix the Active EDW 5650 and 6650 nodes in a single Teradata cluster, it is not advisable to mix the 6680 flash-enhanced nodes with node that don't have flash because, as Gnau put it, in a shared-nothing cluster, the cluster only performance as fast as its slowest node, so the 6680 nodes would be tapping their feet about three quarters of the time.
While Teradata did not release pricing information for the two new appliances, Gnau did give some guidance. The Active EDW 6650 costs less than the 5650 appliance announced last fall. The flash-enhanced Active EDW 6680 costs about the same as last year's 5650 appliance, but because it can do more work, it yields a much better cost per query than the 5650 did. The Active EDW 6680 has about a 75 per cent reduction in data center footprint and power consumption compared to a 5650 appliance yielding the same performance. ®