Discover more from Matt Rickard
Service Reliability Math that Every Engineer Should Know
Uptime Downtime (Yearly) 99.00000% 3d 15h 39m 99.90000% 8h 45m 56s 99.99000% 52m 35s 99.99900% 5m 15s 99.99990% 31s 99.99999% 3s
For a service to be up 99.99999% of the time, it can only be down at most 3 seconds every year. Unfortunately, achieving that milestone is an arduous task, even for the most experienced site reliability engineering teams.
Visualizing service uptime is essential for all types of engineers. Know what your service can realistically deliver. Know what the customer requirements are. Adding an extra "9" might be linear in duration but is exponential in cost.
For the last 90 days, Stripe's API has had 99.999% uptime, or five 9's. That's a gold standard for many companies. Service-level agreements are more likely to count downtime on a quarterly or rolling basis rather than yearly. Calculating it like that gives you a bit more leeway on how you calculate it, but the magnitudes stay the same. Some will even remove "planned maintenance" from the downtime calculation.
I originally posted this on Twitter, and the response was overwhelming. Follow me on there for more valuable engineering snippets like this.