Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A 4 9’s means you can only have 6 minutes down a year.

4 9's is 52 minutes of downtime a year. Keep in mind that single region EC2 SLA is only 99.99%. And if you rely on a host of services with an SLA of 99.99, yours is actually worse than 99.99. So if you want to actually get to 99.99, your components have to be better than this, meaning you will have to go multi-region. So achieving this is actually way harder than this simple step.



This is a very salient point. If your service relies on N other services, each with a SLA of 99.99%, the chance of a single request having at least one failure is:

    1 - .9999^N 
Which means if you make 10 requests, you go from 99.99% to 99.9% or from 52 minutes to 8.77 hours of downtime a year.

In most cases you're likely to be making a lot more than 10 service calls.


Depends on if those 9s are in series or in parallel. In series it multiplies to produce lower availability but in parallel they give you higher availability.


> AWS will use commercially reasonable efforts to make the Included Services each available for each AWS region with a Monthly Uptime Percentage of at least 99.99%, in each case during any monthly billing cycle....

So to achieve 99.99% within a region, every component should have at least 3 nodes and to better it deployment should go multi-region which will escalate the costs quickly.

Most application in reality don't even need four 9s so this works b beautifully for everyone. I work in outsourcing industry and in bad old days we had huge penalties and many rounds of explanations even for applications with no redundancy requirements ;).

But it's just Amazon credit nowadays and no one blinks and eye so it's win win the all.


3 nodes of a component in parallel would give you 99.9999% for that component.


Yes but not in AWS land. Committed SLA for availability of entire region is still 4 nines irrespective.


Hmmm. That's good to know.

So in that case you have to replicate across three regions to get 6 nines. So one component needs 9 copies running around the world to have 6 nines for the component.


Preety much. As I said above it works because most internal apps within the Enterprise don't even need 2 nines.


Updated, Thanks!


It's still meaningful to discuss 99.99% on top of things that are around 99.99%.

For example, let's say you have a service on AWS and all your clients are on AWS. If AWS is down, you are down but so are your clients. But your clients want you to be up 99.99% of the time that AWS is up. As long as both sides are aware of the implications, this is fine.

As long as you're within the same order of magnitude, it can make sense. If a customer wanted me to be up 99.99% of the time on top of a service that is only up 99.9% of the time, I would push back.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: