By | Timothy Prickett Morgan 29th July 2009 19:17

IBM outs BAO box speeds and feeds

Inside the 'smart' system

Yesterday, IBM launched the first of a number of integrated and optimized Smart Systems, this one aimed at data warehousing and predictive analytics. But the company said precious little about the feeds and speeds of what is inside this system, aimed at the business analytics and optimization (BAO) market.

IBM admitted yesterday that the machine was based on its Power 550 midrange servers, that it ran a stack of Cognos software, and that it would run another stack of SPSS software when it ships at the end of September. But that's about it. This is what happens when you let software people do a hardware announcement.

But Scott Handy, vice president of marketing and strategy for IBM's Power Systems division, is very much interested in iron and isn't trying to hide the underpinnings of this Smart Analytics System - or, as we call it, the BAO box. It's just that the software people wanted to focus on the nature of the opportunity and generally what the benefits were for an integrated, tuned stack of data warehousing and analytics code running on a cluster.

As it turns out, the BAO box is based on a shared-nothing cluster architecture, and it has a number of machines in the cluster supporting different application modules and database functions performed by the system. The base server node in the cluster is a four-core Power 550 box using two-core Power6+ processors running at 5 GHz. This machine was upgraded last October so it could support up to eight cores, so there is room for expansion in the nodes for extra processing oomph if algorithms require it.

Each processor core has its simultaneous multithreading (SMT) electronics activated, which gives each core two threads for software to play with. The Power6+ chip used in the Power 550 has its AltiVec vector co-processors activated (they are not activated in two-socket Power 520 machines using Power6 processors, apparently).

This is one reason why IBM is basing the BAO box on the Power 550 instead of the Power 520. Each Power6 core has a decimal math unit, two floating point math units, and two integer units, as well as a VMX co-processor that hangs off the side of the Power6 core. All of these no doubt are brought to bear as much as possible with the optimized Cognos and SPSS software stacks on the BAO box.

Each Power 550 in the box is configured with 32 GB of DDR2 main memory, and it has two dual-port Gigabit Ethernet adapters for linking out to other nodes in the cluster and two dual-port 4 Gb/sec Fibre Channel adapters to link out to shared DS5300 disk arrays, which are cross-coupled to four server nodes in the BAO box. The DS5300 is capable of feeding 1,200 MB/sec of throughput to each node in the cluster on sequential reads, according to Handy, and this is done by backstopping the DS5300 with lots of EXP 5000 disk expansion drawers that have sixteen disks each and that hang off the DS5300 arrays, which have dual RAID 5 controllers and redundant paths to all server nodes.

There is one EXP5000 drawer that has sixteen drives as hot spares. The four server nodes in the base BAO box configuration - that's one administration node, one management node, one standby node, and one database node - are linked to the DS5300 arrays through redundant SAN40B switches. The DS5300 has eight active EXP5000 drawers and one full of spares. Each server has 32 146GB 15K RPM drives allocated to it in this based configuration. The four-node setup has 18.7 TB of capacity, not including local storage in the server nodes.

The server nodes in the BAO box are linked to each other and to the outside world through a set of 48-port EX4200-48T switches from Juniper Networks. These are plain old Gigabit Ethernet switches, no fancy schmancy 10 Gigabit or InfiniBand high-bandwidth, low latency switches.

Yes, there's software too

And now for the software. All of the server nodes are running AIX 6.1 Service Pack 3 in 64-bit mode, as well as IBM's General Parallel File System V3.2.1 and Tivoli System Automation V3.1.0.3 for node management. The application nodes are running IBM's DB2 V9.5 database management system at the FP4 level and one of the nodes runs the InfoSphere Warehouse 9.5.1 data warehouse.

One node on the cluster runs a stack of Cognos 8 modules, including BI Server, Go Dashboard, and BI Samples all at the V8.4 FP2 level. Each of the three servers are set up with four logical data partitions, which has one Power6 core, 8 GB of memory, eight disk drives attached to it. The partitioned database has a 4 TB user space. Customers can add hot standby data and application nodes to the cluster if they want to.

IBM is offering the BAO box with a single product number, but it comes is six different sizes, all given generic (and American) T-shirt sizes to make it simple to talk about. The base box outlined above is the XS size, but to run the full software stack, it looks like you need to get the S size, which has three database nodes and an application server node and has a user space that spans up to 12 TB for the data warehouse to play in.

As the BAO box gets big enough, a special node is added to deal with user management, and standby data and application servers are added to the mix. The M, L, and XL configurations of the BAO box had 7, 14, and 27 database nodes and have 25, 50, and 100 TB of user space. The top-end XXL version has a 200 TB user space and has 53 data nodes, and the whole shebang requires 19 racks, including redundant boxes and storage.

Because every data warehouse is not the same in terms of the data sets and the kinds of queries it needs to handle, the number of users does not scale with the T-Shirt size. IBM is recommending that the XS configuration have 100 named users and that the XXL one have around 5,000 named users. But IBM recommends that all but the largest setup have no more than 50 concurrent users running queries and that the XXL size have no more than 100 concurrent users.

The Smart Analytics System will ship at the end of September, and it will eventually support the predictive analytics software that IBM is getting through its $1.2bn acquisition of SPSS. That deal is expected to close before the end of the year, and it is not clear how the SPSS software will affect the configuration of the BOA box. ®

