Amazon Reveals AWS Outage Caused by Automation Bug
Amazon Reveals AWS Outage Caused by Automation Bug

Amazon has disclosed the root cause of this week's major AWS outage, which disrupted thousands of services including Signal, Snapchat, and banking platforms. In a detailed post on Thursday, AWS attributed the incident to a bug in its automation software that led to a cascading failure.

The outage stemmed from a latent defect in DynamoDB's automated DNS management system. DynamoDB, AWS's database service, maintains hundreds of thousands of DNS records using automation to monitor and update them. A bug caused an empty DNS record for the US-East-1 region in Virginia, which the automation failed to repair automatically, requiring manual intervention.

As a result, customers could not connect to DynamoDB, affecting numerous sites and applications. Downdetector reported over 8.1 million problem reports from users worldwide, with around 2,000 companies impacted. Services like Eight Sleep's smart beds also suffered, leaving users unable to adjust bed settings via their phone app.

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

AWS has disabled the DynamoDB DNS planner and enactor automation globally while it implements fixes and additional protections. The company is working to prevent a recurrence of the issue.

Dr Suelette Dreyfus, a lecturer at the University of Melbourne, commented on the broader implications, noting that the outage highlights the risks of relying on a few major cloud providers. She stated that the internet's original resilience has been eroded by dependence on a handful of tech giants for data storage and services.

Pickt after-article banner — collaborative shopping lists app with family illustration