Leaving these old components in place they mistakenly came back with a zero usage error. The outage would have happened sooner without a grace period the company had in place. Unfortunately, this patch expired and its automated systems started behaving like the problem was real. Google had safeguards in place to prevent these kinds of issues, but they weren’t designed to deal with the exact case that happened on Monday morning.
“We would like to apologize for the magnitude of the impact this incident has had on our customers and their businesses,” Google said. “We take very seriously any incident that affects the availability and reliability of our customers, especially incidents that span multiple regions.”
While the company’s engineers were able to resolve the issue relatively quickly, Google announced plans to implement new measures to avoid a similar situation in the future. In particular, one of its objectives is to communicate better when an outage terminates its services. It also plans to improve its monitoring systems to detect incorrect configurations earlier.