Yesterday April 8th 2021 at around 22:00 UTC, Facebook experienced a major outage where Facebook, Messenger, WhatsApp web and Instagram were all down and unavailable.
The last update was reported 3 hours later resolving the incident, so even though the status page doesn’t state the duration of the incident, we can assume it was still affecting some users that long.
Today there is no company or service that can realistically be free of outages and have 100% uptime, incidents happen, but how you respond to and communicate them makes all the difference to your customers and users.
When it comes to communicating incidents, a status page is a tool that continues to grow in popularity, as it provides users publicly and privately with timely information when an incident arises.
But what happens if you host your own status page within the same infrastructure where your services live?
Facebook’s status page (https://developers.facebook.com/status/) is a self hosted, home made solution, and is apparently hosted within the same infrastructure as the other services affected by this outage.
This puts their status page at risk of being affected by the very same issues it’s meant to report on, which is exactly what happened yesterday, as verified by multiple Hacker News users.
A golden rule for successful incident communication ought to be: never host your status page within your own infrastructure, and if you have to self host it, at the very least use completely separate infrastructure, data center, hosting provider and domain.
Hosted status page solutions solve this problem fully, as you delegate the hosting of your status page to a third party, you just need to make sure you don’t share infrastructure with the status page provider you choose.
This was one out of our 4 Key Learnings from Facebook Status Page.