The Channel logo


By | Paul Kunert 31st August 2016 10:33

Cloudy biz Vesk suffers 2-day outage – then boasts of 100% uptime

HDD failure led to 'split brain event' – for systems and techies alike

A failed storage controller caused a protracted outage at hosted desktop and cloud slinger Vesk - not that this factoid has made its way onto the company’s website, where it boasts of 100 per cent uptime for the past 1,583 days.

Vesk, acquired by London-listed Nasstar plc in October, has written to customers in a bid to explain service problems that first showed up at lunchtime on 26 August and lasted into the wee hours of 27th.

“We suffered from a system failure which resulted in loss of access to emails and certain dedicated instances hosted on the same platform,” the company stated in a letter to customers, seen by El Reg.

The monitoring platform noted a rise in server resource consumption and began troubleshooting. The infrastructure team were then alerted to a “storage controller failure” as users reported Outlook and Email wobbles.

Specifically, a failed hard disk in the Storage Access Network caused a “panic event” on the primary controller that triggered a failover between two storage controllers.

The storage fail led to a “split brain event” and subsequent “levels of corruption within each virtual desk as they were being served by independent controllers,” Vesk said in the letter.

To repair the corruption, the platform was taken down at the end of normal working hours, “there were however, too many clusters of corrupted bad blocks to repair, and the timeframes indicated that the process would take many days to complete.”

The decision was taken at 11pm to invoke the DR plan and “failover to a secondary data centre site”.

All affected services were dragged online again early the next morning, though some of Vesk’s Exchange and dedicated SharePoint databases had "failed to start".

"As a result of our documented disaster recovery plans and procedures, were able to keep the doiwntime to a minimum for the majority of the affected environments," the company claimed.

Vesk said it is reviewing the configuration applied on the storage controllers to figure out how to introduce further fail-safes in the future and arrange a plan to switch back to the primary DC from the DR platform.

On its website, Vesk claimed it had had 100 per cent uptime for all of 2012, ’13, ’14, ’15 and even 2016, despite us still having a full quarter of the year to go. ®

comment icon Read 10 comments on this article or post a comment alert Send corrections


Frank Jennings

What do you do? Use manual typwriters or live in a Scottish croft? Our man advises
A rusty petrol pump at an abandoned gas station. Pic by Silvia B. Jakiello via shutterstock

Trevor Pott

Among other things, Active Directory needs an overhaul
Baby looks taken aback/shocked/affronted. Photo by Shutterstock

Kat Hall

Plans for 2 million FTTP connections in next four years 'not enough'
Microsoft CEO Satya Nadella


Suit-and-tie-wearing man tries to meditate, take deep breaths in faux yoga pose. Photo by Shutterstock
Emotional intelligence, not tech skills, is the way to woo suits
League of gentlemen poster - Tubbs and Edward at the local shop. Copyright BBC
One reselling man tells his tale of woe