Confessions of an S3 Index Subsystem: A Tragedy in US-EAST-1
On Feb 28, 2017, one wrong input turned a routine operation into internet theater.
5 transmissions tagged #postmortem
On Feb 28, 2017, one wrong input turned a routine operation into internet theater.
GitLab’s 2017 outage is a reminder that backup success logs are not the same thing as recovery readiness.
Two famous outages, one quiet lesson: incidents often start long before the pager goes off.
The Facebook outage of October 2021 wasn't about BGP. It was about what happens when your safety mechanisms assume partial failure — and you get total failure.
How a race condition in DynamoDB's own DNS automation cascaded into a 14-hour outage affecting half the internet.