Google loses customers’ data after lightning strikes

20 Aug 201512 Shares

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Google has released an incident report stating that, following four lightning strikes to the utilities grid that powers its European data centre, some 0.000001pc of data has been permanently lost from its storage systems.

The strikes, which occurred at 4.19pm UTC on Thursday 13 August, hit the utilities grid in Belgium – where the Google europe-west1-b data centre is housed – temporarily knocking some cloud services offline for a short period.

Google does have failsafes in place. According to the incident report, in the event of power loss, auxiliary systems restore power quickly, while battery backup ensures everything keeps ticking over in the interim.

However, certain systems remain vulnerable to power outages.

In this case, “some recently written data was located on storage systems which were more susceptible to power failure from extended or repeated battery drain”.

The report states that “approximately 5pc of the Standard Persistent Disks in [zone europe-west1-b] experienced at least one I/O read or write failure during the course of the incident”.

While Google engineers immediately began data recovery operations and, “in almost all cases the data was successfully committed to stable storage”, some recent writes were unrecoverable, leading to a 0.000001pc permanent loss of data.

Following the incident, Google engineers carried out a review of the data centre stack and have noted several opportunities for reducing the risk of an event like this happening again.

These include continuing to upgrade hardware, implementing multiple orthogonal schemes to increase Persistent Disk durability, and improving the response procedures for system engineers.

“We have conducted a thorough analysis of the issue, in which we identified several contributory factors across the full range of our hardware and software technology stack, and we are working to improve these to maximise the reliability of [Google Compute Engine]’s whole storage layer,” read the report.

In the incident report, Google apologised to affected customers.

Main image, via Shutterstock

Kirsty Tobin served as Careers Editor at Siliconrepublic.com up to August 2017

editorial@siliconrepublic.com