Amazon has disclosed the root cause of this week's major AWS outage, which disrupted thousands of websites and services globally, including Signal, Snapchat, and smart beds. In a detailed post on Thursday, the company attributed the hours-long disruption to a bug in its automation software, specifically within the DNS management system for its DynamoDB database service.
The outage, which affected the US-East-1 datacentre region in Virginia, stemmed from an empty DNS record that the automated system failed to repair. AWS explained that DynamoDB maintains hundreds of thousands of DNS records using automation to monitor and update them, ensuring capacity, handling hardware failures, and distributing traffic. The bug prevented this process, requiring manual operator intervention to resolve.
As a result, customers were unable to connect to DynamoDB, causing cascading failures across other AWS tools. Downdetector reported over 8.1 million problem reports from users worldwide, with more than 2,000 companies affected, including banking sites, Ring, and Duolingo. Services were restored within hours, but the impact highlighted vulnerabilities in internet infrastructure.
Eight Sleep, a smart bed company, saw customers unable to adjust bed settings via their phone app during the outage. CEO Matteo Franceschetti apologised on X and announced a Bluetooth update for critical functions. Dr Suelette Dreyfus, a lecturer at the University of Melbourne, noted the incident underscores dependence on a few cloud providers, eroding the internet's original resilience.



