Comment Oracle is making hay over last weekend's mega six-hour Amazon Web Services (AWS) cloud outage. "You get what you pay for," tweeted Oracle's Phil Dunn, with the caveat that all views are his and don't necessarily reflect those of Oracle. But you get the point.
Yes, Amazon's been left with egg on its face and rivals will be exploiting the giant's stumble. There's oeuf, too, on the plate of some of the web's most hallowed – in Wall St and Silicon Valley circles at least – names.
More ReadingSpeaking in Tech: Larry Ellison pipes up on EMC/Dell – it's all yours MichaelAmazon flings open source Elasticsearch at Big Data's cloudSky 'fesses up to broken fibre cables as cause of outage woesMicrosoft preps Azure data lake flood gates for readinessWhoops, there goes my cloud: What to do when AWS foresakes you
Netflix, Tinder, Airbnb, and IMDb were all down or sputtering.
If anybody is venerated it's Netflix, arguably AWS's most high-profile customer – a reference customer, no less, served up on the AWS site as an example of just how far you can go if you stake absolutely everything on AWS.
Without AWS, it's doubtful Netflix would exist as we know it. Netflix would have had to continue investing in its own data center build program and the rocket scientist brains to build the elastic magic. If it had, perhaps Netflix would now be Amazon, selling its spare capacity and expertise to others. Instead, it outsourced it to Amazon.
In another time, people would have cautioned strongly against relying on a single supplier for your critical IT needs. On the web, that's thrown out the window.
But doesn't Netflix learn? It moved to AWS following a staggering outage at its own data center in 2008 and when faced with the prospect of huge growth in its business. Netflix decided it was best to place its faith in the pros. And yet Netflix still went down with AWS in a big way in April 2011. Now Netflix is reported to be closing its last own-operated data center in favor of AWS.
But hang on. Amazon isn't the only one who should hang its head in shame, here. Actually there's plenty of egg for other faces in this game – Microsoft with Azure and Office 365, Salesforce, Google – all down over the last few years for periods ranging from a few hours to entire days. To them it's nothing – a mere statistical rounding error in their pledge to maintain 99.999 per cent uptime. But to those on the sharp end, it's lost business.
Netflix garners incredulous headlines just because couch potatoes must do the unthinkable and change channels or go outside. But to the broader mass, to thousands of businesses, it means literally not being able to do business: no ERP to manage production or suppliers, no CRM to run sales or talk to customers, no email to talk to colleagues. Your only option is to twiddle your thumbs between hitting refresh on the status page. That statistical rounding error suddenly looks big when you're up close to it.
Customers have been handing this infrastructure over to providers of public clouds, having convinced themselves they are the companies who know best. Uptime and servers is what they do, so they can run this stuff better than you.
Which makes it more confounding when the experts manage to screw up planned maintenance.
Or, as in the case of AWS at the weekend, Amazon failed to read the growing evidence of the popularity of its DynamoDB NoSQL database service. Demand for Global Secondary Indexes put too much of a strain on the metadata servers, forcing systems to stall. Worse, Amazon hadn't anticipated this could be a problem, so its monitoring service wasn't set up properly to fully observe this as a failure.
It's not all doom and gloom
But there is a bright spot.
Despite the embarrassment of Netflix this time around, there were plenty of other AWS customers last weekend who didn't seem to suffer – among them, News UK. Rupert Murdoch's news operation runs the paywall, *ahem* "access control system" and the tablet and web versions of the Times, Sunday Times, and The Sun on AWS. They didn't go down.
The best way to avoid going dark is to architect your service to fail over to different nodes within a region. Even better, different regions. The AWS outage was centered on the giant's US-East region – it has eight others across the planet.
You can build a caching layer above your chosen cloud platform to carry copies of your data. It's like the time delay on TV and radio to stop naughty words leaking over the airwaves. You work with the cached data until the underlying one comes back when your copy replicates and updates.
Another option is to not rely on a single cloud provider. This is an option more open to users of IaaS and PaaS rather than SaaS.
CIOs The Reg talks to aren't going with a single IaaS or PaaS provider, they are going with more than one in a dual-cloud strategy. They might, say, pick AWS for compute and Google for storage. Or AWS for one business unit and Microsoft Azure for another, where there are competitive concerns. There are, of course, other regional and global public cloud firms available.
This is good for failover, yes, but also to help ensure you don't become hostage to one provider. Keeping multiple suppliers in the mix gives you leverage on price and helps ensure that your public cloud providers don't start to take you for granted.
Cassandra database provider DataStax told me recently that it's seen use of dual cloud grow during the past 12 months.
DataStax this week joined Microsoft's Enterprise Cloud Alliance, meaning its Cassandra implementation is integrated with Azure for automated, wizard-driven deployment – Datastax also announced its Enterprise 4.8. DataStax was already on Azure and is also on AWS.
Billy Bosworth, DataStax chief executive, said: "We are seeing increasing examples where customers want to have multiple cloud vendors. People are looking at not putting all their eggs in one cloud basket, but looking at the pros and cons of each solution."
The endgame is portability for genuine load failover in the event of emergency.
That is, moving your data or your applications between IaaS providers the same as you'd switch traffic between data center providers, in the event a server, data center, or network connection goes down. Some claim to offer this, but it's not clear how. Further, the actual moving of data is what costs real money. Public cloud providers let you check in new data any time you like, and the room charge is small if the data doesn't change. The charges only rack up when you move your bits around their cloud – or want to extract them.
Heads should roll at Netflix for its over-dependence on AWS. Increasingly, its status as an all-in-one-AWS pioneer is hurting it. There are options in the IaaS and PaaS world. It's time to diversify. ®