The Channel logo

News

By | Paul Kunert 21st September 2011 11:31

Microsoft cloud evaporated by one busted file

Services failed for hours

A corrupted file in Microsoft's DNS services brought down its cloud across the world, the software giant has revealed.

In a dramatic failure, Office 365 and Windows Live services including Hotmail and SkyDrive fell over for more than three hours earlier this month, causing further embarrassment for Redmond.

No customer data was lost or compromised during the outage, according to a blog post penned by Arthur de Haan, Microsoft vice president for Windows Live Test and Service Engineering. He went on to detail the cause of the outage.

"A tool that helps balance network traffic was being updated and the update did not work correctly. As a result, configuration settings were corrupted, which caused a service disruption," he wrote.

It took some hours for normal service levels to resume and time for the changes to replicate across the planet.

De Haan said the file corruption had been caused by two "rare conditions" that happened at the same time, which were "tracked" to the networking device firmware used in Microsoft's DNS service.

"The first condition is related to how the load-balancing devices in the DNS service respond to a malformed input string (ie, the software was unable to parse an incorrectly constructed line in the configuration file).

"The second condition was related to how the configuration is synchronised across the DNS service to ensure all client requests return the same response regardless of the connection location of the client," said De Haan.

Microsoft said it has pinpointed "two streams of work" to improve the service around monitoring, problem identification and recovery, "further hardening the DNS service to improve its overall redundancy and fall-over capability".

De Haan added that the firm is also developing another recovery process and reviewing recovery tools to cut down the time it takes to restore outages.

"We are determined to deliver the very best possible service to our customers and regret any inconvenience caused by this outage," he said.

The Advertising Standards Agency is currently investigating a customer complaint into claims that Microsoft has made about its cloud services. ®

comment icon Read 33 comments on this article alert Send corrections

Opinion

Houses of Parliament in night-time

Andrew Orlowski

Come on everybody, let's upload all our stuff into Government by Cloud
Joe Tucci EMC
frustration_anger_irritation_annoyance pain

Felipe Costa

Pressure to perform for stock market bearing down on disties
Columns of coins in the cloud

Michael Cote

Anything that simple to use has got to be complex to set up

Features

Alistair Darling and Alex Salmond debate Scottish independence
You keep the call centres, Hamish, we'll take the banks
Internet of Things
Everyone loves those Things, just not on each others' terms
No email? No CRM? No Daily Mail iPad edition? You need a plan
Sinofsky's hybrid strategy looks dafter than ever