The Channel logo


By | Chris Mellor 16th January 2012 09:01

Flash drive meltdown fingered in Swedish IT blackout

Tieto's EMC VNX5700 array sparked 5-day disarray - new claim

Tieto's five-day outage disaster started with multiple failures of its EMC VNX5700 array's FAST Cache, according to a Finnish source close to the matter.

Tieto is a major IT services organisation across Scandinavia and the Nordic region – although it also provides services globally – and pulls in net sales of SEK17bn (£1.59bn). Its large customer base in Sweden means that when it had a five-day outage in November, it caused chaos to IT services across that country. The stoppage was caused by failures in an EMC storage array and compounded by an inadequate disaster recovery plan involving Networker tape backup files which could not be read. The circumstances are not clear and seemed to involve a VNX array with an upgrade to an NS480 (Celerra) system for flash, which is a logical nonsense.

El Reg has been sent a Tieto slide deck (PDF) describing why the service provider migrated from its Celerra NS480 to a VNX5700 and the resulting performance improvements: namely lower latency and more IOPS. This deck is in Swedish but Google Translate gets around that little problem.

Based on the translated slide deck text, the story goes like this: in the 2010/2011 period, with a EMC Celerra NS480 array, Tieto saw its storage challenges as performance, response time, scalability and capacity. So it migrated from RAID (4 + 1) groups to Thick Pools composed of 60 disks and began to segment data types into Fibre Channel and NAS. The next step was to install EMC's FAST Cache with four 200GB SSDs and the cache license, which was beneficial as response times were more than halved to less than 20ms. However the NS480 CPUs were maxed out.

Tieto upgraded to a VNX5700, but retained the 4 x 200GB SSD capacity and Fast Cache license and the 60-disk Thick Pool, although the disks changed from 450GB FC to 600GB SAS ones. 14 x 1.04GB chunks were created in each pool and only FC block access was allowed. The outcome was a boost in IOPS and a further reduction in latency as shown in the chart.

Tieto VNX 5700 chart

Chart showing IOPS increase and latency decrease with move from NS480 to FAST Cache and then VNX5700

So here we have the basic VNX5700 array setup in which the hardware failures that led to the five-day debacle took place. EMC won't comment on any details, having referred us to the Tieto statement seen in our article yesterday. Our source said, for what it's worth: "What basically happened (in my understanding from Twitter rumours) is that Tieto had multiple SSD failures on [its] VNX5700 array Fast Cache, this resulted in data loss."

What needs to be stressed is that Tieto's DR processes were dreadfully inadequate and obviously untested for the eventuality of such a failure. Lawsuits over data loss and business interruptions at Tieto's affected customers are bound to follow. ®

comment icon Read 14 comments on this article alert Send corrections


Alexandre Mesguich

Change is order of day as tech giants shift strategy gears

Frank Jennings

Confused? No problem, we have 5, no 6, no 7... lots of standards

Chris Mellor

VC sequence could end not with a bang, but a whimper
Sad man stares glumly over boxed contents of desk. Image via shutterstock (Baranq)


money trap conceptual illustration
Big boys snare the unwary with too-good-to-be-true deals
Angus Highland cow
Pet carriers not wanted for whitebox stampede
Sorry OpenStack and Open Compute, we're not all Facebook
Gary Kovacs, CEO of AVG. Pic: World Economic Forum
Scammy download sites? Government snooping? Run of the mill for Gary Kovacs