The Channel logo


By | Chris Mellor 16th January 2012 09:01

Flash drive meltdown fingered in Swedish IT blackout

Tieto's EMC VNX5700 array sparked 5-day disarray - new claim

Tieto's five-day outage disaster started with multiple failures of its EMC VNX5700 array's FAST Cache, according to a Finnish source close to the matter.

Tieto is a major IT services organisation across Scandinavia and the Nordic region – although it also provides services globally – and pulls in net sales of SEK17bn (£1.59bn). Its large customer base in Sweden means that when it had a five-day outage in November, it caused chaos to IT services across that country. The stoppage was caused by failures in an EMC storage array and compounded by an inadequate disaster recovery plan involving Networker tape backup files which could not be read. The circumstances are not clear and seemed to involve a VNX array with an upgrade to an NS480 (Celerra) system for flash, which is a logical nonsense.

El Reg has been sent a Tieto slide deck (PDF) describing why the service provider migrated from its Celerra NS480 to a VNX5700 and the resulting performance improvements: namely lower latency and more IOPS. This deck is in Swedish but Google Translate gets around that little problem.

Based on the translated slide deck text, the story goes like this: in the 2010/2011 period, with a EMC Celerra NS480 array, Tieto saw its storage challenges as performance, response time, scalability and capacity. So it migrated from RAID (4 + 1) groups to Thick Pools composed of 60 disks and began to segment data types into Fibre Channel and NAS. The next step was to install EMC's FAST Cache with four 200GB SSDs and the cache license, which was beneficial as response times were more than halved to less than 20ms. However the NS480 CPUs were maxed out.

Tieto upgraded to a VNX5700, but retained the 4 x 200GB SSD capacity and Fast Cache license and the 60-disk Thick Pool, although the disks changed from 450GB FC to 600GB SAS ones. 14 x 1.04GB chunks were created in each pool and only FC block access was allowed. The outcome was a boost in IOPS and a further reduction in latency as shown in the chart.

Tieto VNX 5700 chart

Chart showing IOPS increase and latency decrease with move from NS480 to FAST Cache and then VNX5700

So here we have the basic VNX5700 array setup in which the hardware failures that led to the five-day debacle took place. EMC won't comment on any details, having referred us to the Tieto statement seen in our article yesterday. Our source said, for what it's worth: "What basically happened (in my understanding from Twitter rumours) is that Tieto had multiple SSD failures on [its] VNX5700 array Fast Cache, this resulted in data loss."

What needs to be stressed is that Tieto's DR processes were dreadfully inadequate and obviously untested for the eventuality of such a failure. Lawsuits over data loss and business interruptions at Tieto's affected customers are bound to follow. ®

comment icon Read 14 comments on this article alert Send corrections


Privacy image

Frank Jennings

Two working parties, ministers galore... but data transfer law remains in limbo

Chris Evans

It does simplify the hardware setup, whatever it is
A microscopic view of the biometric shark skin. Pic: James Weaver

Chris Mellor

Do something and stop faffing about in the bush league

Kat Hall

International system in general needs greater transparency


Nerd fail photo via Shutterstock
Shouting match
Single market vs. rest of the world
Mostly it's financial crime. Here's what all the cool kids' terms mean in English
Apple logo. Pic: Blake Patterson
Plenty of bumps in the 40-year road for Mac makers