Saturday, March 22, 2014

Google Report On The March 17th Outage: 'We Screwed Up Y'all, But It's All Good'

Google-Thumb

For over a few hours on Monday, several Google services came crashing to a halt. Users all over the world were unable to send messages via Hangouts, engage in video chats, or check Google Voice. Some people trying to create spreadsheets with Sheets were met with 502 errors, and people taking advantage of the multi-player aspect of Google Play Games were also affected. All of this apparently resulted from an oops during a routine hardware maintenance event where the company miscalculated available capacity.

Oops

During these maintenance events, Google redirects traffic away from certain backend servers to a new set while they perform their work. Due to this slip-up, the new servers lacked enough capacity to handle the redirected traffic. Google Engineers started running the maintenance procedure at 8:25 AM and realized something was up roughly twenty minutes later.

Oops2

The team then brought in additional capacity, halted the maintenance process, and started bringing users back online in waves to avoid overwhelming the system.

These things happen, but if you need the reassurance that Google's learned its lesson, here is a dry list of bullet points the company's provided to show what it's taking away from the experience.

  • Review memory requirements and increase the memory capacity for the affected backend
    servers to meet peak load needs.
  • Implement better monitoring for memory utilization and usage tracking to ensure that servers
    have sufficient capacity available.
  • Lower the alert threshold for errors with the Hangouts service to improve Engineering
    response time.
  • Review internal procedures for bringing up emergency capacity to speed mitigation efforts.
  • Continue work in progress to improve the resilience of Hangouts service during high load
    conditions.

You can read the entire incident report for yourself at the link below.

Google Apps Incident Report - March 17, 2014

Bertel King, Jr.
Born and raised in the rural South, Bertel knows what it's like to live without 4G LTE - or 3G, for that matter. He now lives in the City of Bridges, adjusting to the presence of actual snow. His phone of choice is the HTC One.


source: androidpolice

0 comments :