Matt Rickard

Share this post

Service Reliability Math that Every Engineer Should Know

blog.matt-rickard.com

Discover more from Matt Rickard

Thoughts on engineering, startups, and AI.
Continue reading
Sign in

Service Reliability Math that Every Engineer Should Know

Aug 8, 2021
Share this post

Service Reliability Math that Every Engineer Should Know

blog.matt-rickard.com
Share

Uptime Downtime (Yearly) 99.00000% 3d 15h 39m 99.90000% 8h 45m 56s 99.99000% 52m 35s 99.99900% 5m 15s 99.99990% 31s 99.99999% 3s

For a service to be up 99.99999% of the time, it can only be down at most 3 seconds every year. Unfortunately, achieving that milestone is an arduous task, even for the most experienced site reliability engineering teams.

Visualizing service uptime is essential for all types of engineers. Know what your service can realistically deliver. Know what the customer requirements are. Adding an extra "9" might be linear in duration but is exponential in cost.

For the last 90 days, Stripe's API has had 99.999% uptime, or five 9's. That's a gold standard for many companies. Service-level agreements are more likely to count downtime on a quarterly or rolling basis rather than yearly. Calculating it like that gives you a bit more leeway on how you calculate it, but the magnitudes stay the same. Some will even remove "planned maintenance" from the downtime calculation.

I originally posted this on Twitter, and the response was overwhelming. Follow me on there for more valuable engineering snippets like this.

Share this post

Service Reliability Math that Every Engineer Should Know

blog.matt-rickard.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Matt Rickard
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing