The Six-Hour Vanishing: A BGP Tragedy in Three Acts
One routine command, one silent backbone, and half the planet mashing refresh in unison.
7 transmissions tagged #postmortem
One routine command, one silent backbone, and half the planet mashing refresh in unison.
On Feb 28, 2017, one wrong input turned a routine operation into internet theater.
Cloudflare’s 2019 outage is a reminder that the fastest systems need the calmest guardrails.
GitLab’s 2017 outage is a reminder that backup success logs are not the same thing as recovery readiness.
Two famous outages, one quiet lesson: incidents often start long before the pager goes off.
The Facebook outage of October 2021 wasn't about BGP. It was about what happens when your safety mechanisms assume partial failure — and you get total failure.
How a race condition in DynamoDB's own DNS automation cascaded into a 14-hour outage affecting half the internet.