It's so funny to me that every time this happens it turns out Amazon hosts like 80% of its own services on us-east-1 and it's just us-east-1 that's down. Why they have not diversified their own infrastructure to even just us-east-2 is beyond me.
But like, why not move that outside of us-east-1? That's literally always the one that goes down. Just move that single point of failure outside of the most used region by a large margin on AWS. Seems like a basic reliability engineering practice.
Each site has a network load balancer, I mean if it was an AWS wide one, it probably wouldn't be a single region that goes down. And again, this is just my guess. Having a hardware load balancer is how the place I worked at managed traffic between several server rooms, and it was usually the main culprit for downtime. It's a very handy device so your network doesn't get overwhelmed by traffic spikes, but afaik really hard to make redundant.
Oh I see what you mean. Yeah I guess it would make sense that us-east-1 goes down often since it's the most trafficked and it does have to have physical hardware that can fail.
That sort of thing already exists, as availability zones. Things like EC2 instances live in us-east-1a, us-east-1b, etc, which are comprised of separate data centers. In theory that should provide resilience to even a large-scale outage, but evidently that's not foolproof.
The reason why they let you be very precise with how you provision servers is because some applications require that servers be physically close together, especially high bandwidth stuff.
In particular, the control plane for Route53 (their DNS product) lives entirely in us-east-1, so the blast radius of a bad outage in that region is enormous.
It's so funny to me that every time this happens it turns out Amazon hosts like 80% of its own services on us-east-1 and it's just us-east-1 that's down. Why they have not diversified their own infrastructure to even just us-east-2 is beyond me.
It's probably diversified, there's just a single point of failure that's difficult to get rid of, probably a network load balancer or something.
But like, why not move that outside of us-east-1? That's literally always the one that goes down. Just move that single point of failure outside of the most used region by a large margin on AWS. Seems like a basic reliability engineering practice.
Each site has a network load balancer, I mean if it was an AWS wide one, it probably wouldn't be a single region that goes down. And again, this is just my guess. Having a hardware load balancer is how the place I worked at managed traffic between several server rooms, and it was usually the main culprit for downtime. It's a very handy device so your network doesn't get overwhelmed by traffic spikes, but afaik really hard to make redundant.
Oh I see what you mean. Yeah I guess it would make sense that us-east-1 goes down often since it's the most trafficked and it does have to have physical hardware that can fail.
It’s more that AWS customers build stuff overwhelmingly in us-east-1
Why do they get a choice in a specific server? Sure they could specify us-east, but why let them pick us-east-1 specifically?
That sort of thing already exists, as availability zones. Things like EC2 instances live in us-east-1a, us-east-1b, etc, which are comprised of separate data centers. In theory that should provide resilience to even a large-scale outage, but evidently that's not foolproof.
The reason why they let you be very precise with how you provision servers is because some applications require that servers be physically close together, especially high bandwidth stuff.
deleted by creator
In particular, the control plane for Route53 (their DNS product) lives entirely in us-east-1, so the blast radius of a bad outage in that region is enormous.
deleted by creator