The Channel logo


By | Simon Sharwood 19th August 2015 02:28

Act of God damaged data on Google cloud disks

Old storage systems failed to recover some recent writes after lightning bolts strikes

Google has admitted that some customers running Persistent Disks in its europe-west1-b region have been forced to recover data from snapshots after a combination of lightning and old storage kit was is to blame.

The outage hit last Friday and left some users unable to connect Persistent Disks – a disk that exists independently of a virtual machine – for several hours. Problems persisted across the weekend.

Google's now published its analysis of the fault and says that on August 13th, “four successive lightning strikes on the electrical systems of a European datacenter caused a brief loss of power to storage systems which host disk capacity for GCE instances in the europe-west1-b zone.” [We've since been told by Google folk that post isn't correct, and that the local grid, not Google's bit barn, was hit by lightning.]

“Although automatic auxiliary systems restored power quickly, and the storage systems are designed with battery backup, some recently written data was located on storage systems which were more susceptible to power failure from extended or repeated battery drain,” Google 'fesses up.”

“In almost all cases the data was successfully committed to stable storage, although manual intervention was required in order to restore the systems to their normal serving state. However, in a very few cases, recent writes were unrecoverable, leading to permanent data loss on the Persistent Disk.”

About five per cent of disks in the data centre recorded “at least one I/O read or write failure” during the incident. Read failures persisted into Monday for about 0.05 per cent of users and Google now says just 0.000001% of disk space has proved impossible to recover.

Which isn't a bad result, even if plenty of customers were inconvenienced, especially as either snapshots or other backups would have allowed restoration.

“This outage is wholly Google's responsibility,” the document continues, but then goes on “... to highlight an important reminder for our customers: GCE instances and Persistent Disks within a zone exist in a single Google datacenter and are therefore unavoidably vulnerable to datacenter-scale disasters.”

In other words, should lightning strike twice, you should remember that a datacentre in the hand can't beat two in the bush.

Google's confessional also says the company “has an ongoing program of upgrading to storage hardware that is less susceptible to the power failure mode that triggered this incident. Most Persistent Disk storage is already running on this hardware.” The company adds that it's conducted a review of the incident and “Several opportunities have been identified to increase physical and procedural resilience.” ®

comment icon Read 26 comments on this article or post a comment alert Send corrections


Frank Jennings

What do you do? Use manual typwriters or live in a Scottish croft? Our man advises
A rusty petrol pump at an abandoned gas station. Pic by Silvia B. Jakiello via shutterstock

Trevor Pott

Among other things, Active Directory needs an overhaul
Baby looks taken aback/shocked/affronted. Photo by Shutterstock

Kat Hall

Plans for 2 million FTTP connections in next four years 'not enough'
Microsoft CEO Satya Nadella


League of gentlemen poster - Tubbs and Edward at the local shop. Copyright BBC
One reselling man tells his tale of woe