Channel

This article is more than 1 year old

Cloudy biz Vesk suffers 2-day outage – then boasts of 100% uptime

HDD failure led to 'split brain event' – for systems and techies alike

Wed 31 Aug 2016 // 10:33 UTC

A failed storage controller caused a protracted outage at hosted desktop and cloud slinger Vesk - not that this factoid has made its way onto the company’s website, where it boasts of 100 per cent uptime for the past 1,583 days.

Vesk, acquired by London-listed Nasstar plc in October, has written to customers in a bid to explain service problems that first showed up at lunchtime on 26 August and lasted into the wee hours of 27th.

“We suffered from a system failure which resulted in loss of access to emails and certain dedicated instances hosted on the same platform,” the company stated in a letter to customers, seen by El Reg.

The monitoring platform noted a rise in server resource consumption and began troubleshooting. The infrastructure team were then alerted to a “storage controller failure” as users reported Outlook and Email wobbles.

Specifically, a failed hard disk in the Storage Access Network caused a “panic event” on the primary controller that triggered a failover between two storage controllers.

The storage fail led to a “split brain event” and subsequent “levels of corruption within each virtual desk as they were being served by independent controllers,” Vesk said in the letter.

To repair the corruption, the platform was taken down at the end of normal working hours, “there were however, too many clusters of corrupted bad blocks to repair, and the timeframes indicated that the process would take many days to complete.”

The decision was taken at 11pm to invoke the DR plan and “failover to a secondary data centre site”.

All affected services were dragged online again early the next morning, though some of Vesk’s Exchange and dedicated SharePoint databases had "failed to start".

"As a result of our documented disaster recovery plans and procedures, were able to keep the doiwntime to a minimum for the majority of the affected environments," the company claimed.

Vesk said it is reviewing the configuration applied on the storage controllers to figure out how to introduce further fail-safes in the future and arrange a plan to switch back to the primary DC from the DR platform.

On its website, Vesk claimed it had had 100 per cent uptime for all of 2012, ’13, ’14, ’15 and even 2016, despite us still having a full quarter of the year to go. ®

Topics

Special Features

Vendor Voice

Resources

Channel

Cloudy biz Vesk suffers 2-day outage – then boasts of 100% uptime

HDD failure led to 'split brain event' – for systems and techies alike

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

Huawei's cloud unit is its current growth vehicle

Oracle scores big win with Fujitsu Japan for its Alloy partner cloud

Tencent Cloud to revisit design after circular dependencies slowed emergency API fix

Reducing the cloud security overhead

Alleged cryptojacker accused of stealing $3.5M from cloud to mine under $1M in crypto

Alibaba Cloud reveals network telemetry tool that helped cut number of engineers needed by 86%

Backblaze cloud storage buzzes with added Event Notifications

AWS must pay $525M to cloud storage patent holder, says jury

SharePoint logs are easily circumvented and Microsoft is dragging its heels

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

Huawei Cloud reveals the dynamic traffic allocation system it uses to cut bandwidth bills

Irish power crunch could be prompting AWS to ration compute resources

About Us

Our Websites

Your Privacy