Clouds have a single point of failure, and Stratus Technologies thinks it can make it some dough fixing it.
Stratus and its peers NEC and Hewlett-Packard, which sell fault-tolerant servers, would love for you to be so freaked out by the mission-critical nature of your applications that you would get out a big ole check and buy their machines. Historically, this has been a tough sell for all but a handful of workloads, such as ATMs or other financial systems, police and fire communications systems, and certain parts of telecommunications and trading networks. The reason is that as workloads have become more distributed and replication and availability are built into the software stack above the operating system, it has become less and less necessary for hardware-software fault tolerance below the operating system layer.
More ReadingPerish the fault! Can your storage array take a bullet AND LIVE?Stratus: Virtualization drives demand for ftServersStratus load balances virty machines across Avance clustersStratus runs Marathon after high-availability server rival gobbleStratus slides Avance virtual clusters onto Xeon E5 servers
But eliminating single points of failure is not easy, and in the case of VMware's ESXi server virtualization hypervisor and the related vCloud Director infrastructure cloud fabric that rides atop it, that single point of failure is the vCenter management console, the out-of-band control freak that tells ESXi and its virtual machine containers what to do and where to do it.
Stratus and NEC are development partners in creating add-on electronics for Xeon 5500 and 5600 servers that put two X86 servers in absolute lockstep, bit for bit. High availability is assured by hardware and software redundancy and lockstepping, not through replication over a network. Stratus and NEC add different software atop the common Xeon hardware, which is branded the ftServer and Express5800 R series by these respective vendors.
Stratus certified VMware's freebie ESXi hypervisor to run atop its ftServers back in December 2008, and even tossed in a free copy of ESX Server 3 on its ftServer machines in March 2009. In October 2010, when Stratus announced support for Intel's Xeon 5600 processors in its ftServer iron, it also added support for Microsoft's Hyper-V hypervisor.
With a cloud of servers running hypervisors and having live migration capability to move running virtual machines around physical machines either before or after physical machines fail, the need for fault-tolerant servers would seem to be diminished. But there's still that one point of failure: the virtualization control console. In the VMware world it is vCenter; with Citrix Systems it's XenCenter; with Microsoft it's Systems Center; and with Red Hat KVM it is RHEV Manager.
These consoles are used to configure virtual machines on one or more hypervisors running on one or more physical servers. If you walk up to one of these consoles and shoot it repeatedly with a large caliber rifle, you will be living a fantasy of many sysadmins the world over. The death of such a control freak will not take down the hypervisors, which continue to run out of band, but many of the advanced features of the system – such as live migration and other cloud control layers – do require that control freak.
You don't want to lose that hypervisor management console, and that is one of the reasons why three years ago VMware launched a feature called Heartbeat, which is an OEMed version of a failover clustering tool from a company called Neverfail. vCenter is a Windows Server program, and you really want to have some sort of live backup plan for it, and to get around paying for Heartbeat. Some companies have been plunking it into a virtual machine running on ESXi and then using the failover and replication features of vCenter to replicate vCenter. Yes, this sounds like a bad idea, and yes, it is still industry best practice to use a clustered Windows server to run vCenter.
Alternatively, you can buy the new Uptime Appliance for vCenter from Stratus, which is just a clever way of saying that it is a rebadged ftServer 2000 series machine that has been configured to run vCenter in fault tolerant mode. The Uptime Appliance is a pair of two-socket rack servers that have only one Xeon E5504 processor in each chassis; these are four-core chips running at 2GHz, and that is more than enough oomph to run vCenter, according to Denny Lane, director of product and marketing management at Stratus. The machine comes with four 146 disks spinning at 15K RPM and 16GB of main memory per chassis, with each node running its own copy of Windows Server 2008 R2 Standard Edition, the Stratus software stack, and vCenter 5 Standard Edition. The whole shebang costs $24,627, and because it is called an "appliance", Lane says it is actually an easier sell into shops that might preferentially buy HP, Dell, IBM, or Fujitsu x86 iron. (Yes, that is silly. But that's Earth for you.)
"Anyone who has 25 hosts or more will see a big benefit from this appliance," says Lane. He estimates that there are tens of thousands of vCenter servers running out there in the world, and it would be funny if Stratus sold a lot more boxes for control freaks than it did for actual workloads.
This Uptime Appliance price stacks up pretty well against setting up a Windows cluster running VMware Heartbeat and vCenter. Lane priced up two Hewlett-Packard ProLiant servers of similar processing, memory, and storage capacity, and put Heartbeat on each node (for $9,995 each) plus vCenter, and it cost $36,081.
Stratus is not making any commitments about how it will create Uptime Appliances for the other server virtualization control freaks, but obviously what applies to vCenter applies equally well for XenCenter, Systems Center, and RHEV Manager. ®