My job is literally unraveling shit like twitter, trying to keep the lights on with minimal knowledge
The complexity of twitter and the degree of layoffs they're rumored to have will be extremely difficult to come back from without experiencing some sort of catastrophic failure first. Will require an absolute assload of people to fix it, more than 1x the people who were fired. Maybe 2x. And they'll have to know what they're doing. Remains to be seen if those catastrophic failures will be unrecoverable
It just takes one missed maintenance task to bring everything to a pile of rubble, and my understanding is that not only did they fire everyone who does the maintenance tasks, they fired everyone who knows who was supposed to do the maintenance tasks
I think all the people talking about programmers thinking they're unreplaceable need to learn what ops/infra/SRE/security teams do and what happens when you get rid of them
Thank you. Working with a code stack you've not seen before is like learning a new language, and if the language of documentation (if indeed, there is any documentation) isn't your native language, then it's gonna be rosetta stone levels of confusion.
It's not operating normally - it's perceived to be operating normally, but it's a ticking time bomb for catastrophic failure and they laid off all the firefighters
It's also absolutely hemorrhaging money and losing revenue, advertisers are fleeing the platform and a lot of standing business deals have been cancelled because there's no one left arranging them. It's not taking down videos with DMCA claims because no one is left to deal with them, which is going to get it into legal trouble. The government is mad bc they fired the CIA regime change team. Capital might not care about long-term technical stability, but it does care about short term profits, intellectual property, and the ability to enforce its hegemony
I think the next major outage will be scrutinized heavily for sure. And I have a hunch the next outage will be in the next couple months — globally-observed events with spikes in traffic that have second-level specificity tend to put huge strain on very complex systems like most modern social media sites.
Teams of people spend weeks doing capacity planning in anticipation of events like the World Cup or New Year's typically. World Cup specifically is a notorious SRE nightmare for big social media sites — you'll have people from all over the world posting at the exact same time (like, to the second/minute) when exciting things like goals happen and they'll be posting photos and video clips and excitedly spamming a million tweets.
The #1 most common cause of major site outages is increased load by far. You've got a complex system with a million little tiny gears, and if one gets overwhelmed and starts to slow down, or if a disk or something fills up, or too many things are connected to a database, or whatever then the whole thing catches on fire in spectacular ways
having everyone in the world tweeting about an incredible save, or a shitty call, or a ridiculous goal all at the same time means the traffic is super concentrated and super high. A lot of work has to go into preparing to keep things online and actively putting out fires while the events are ongoing. I've heard engineers from Instagram talk about how they always have a miserable New Years bc a lot of things always break with the increased load. And it's just not possible or practical to anticipate every potential failure mode.
Long ramble, but i'd expect it to be a slow collapse until it isn't. It can't stay online with a skeleton crew forever
My job is literally unraveling shit like twitter, trying to keep the lights on with minimal knowledge
The complexity of twitter and the degree of layoffs they're rumored to have will be extremely difficult to come back from without experiencing some sort of catastrophic failure first. Will require an absolute assload of people to fix it, more than 1x the people who were fired. Maybe 2x. And they'll have to know what they're doing. Remains to be seen if those catastrophic failures will be unrecoverable
It just takes one missed maintenance task to bring everything to a pile of rubble, and my understanding is that not only did they fire everyone who does the maintenance tasks, they fired everyone who knows who was supposed to do the maintenance tasks
I think all the people talking about programmers thinking they're unreplaceable need to learn what ops/infra/SRE/security teams do and what happens when you get rid of them
Thank you. Working with a code stack you've not seen before is like learning a new language, and if the language of documentation (if indeed, there is any documentation) isn't your native language, then it's gonna be rosetta stone levels of confusion.
deleted by creator
Similarly, big ups to my life partner, Python.
deleted by creator
It's not operating normally - it's perceived to be operating normally, but it's a ticking time bomb for catastrophic failure and they laid off all the firefighters
It's also absolutely hemorrhaging money and losing revenue, advertisers are fleeing the platform and a lot of standing business deals have been cancelled because there's no one left arranging them. It's not taking down videos with DMCA claims because no one is left to deal with them, which is going to get it into legal trouble. The government is mad bc they fired the CIA regime change team. Capital might not care about long-term technical stability, but it does care about short term profits, intellectual property, and the ability to enforce its hegemony
In no way will this be looked at as a "success"
deleted by creator
I think the next major outage will be scrutinized heavily for sure. And I have a hunch the next outage will be in the next couple months — globally-observed events with spikes in traffic that have second-level specificity tend to put huge strain on very complex systems like most modern social media sites.
Teams of people spend weeks doing capacity planning in anticipation of events like the World Cup or New Year's typically. World Cup specifically is a notorious SRE nightmare for big social media sites — you'll have people from all over the world posting at the exact same time (like, to the second/minute) when exciting things like goals happen and they'll be posting photos and video clips and excitedly spamming a million tweets.
The #1 most common cause of major site outages is increased load by far. You've got a complex system with a million little tiny gears, and if one gets overwhelmed and starts to slow down, or if a disk or something fills up, or too many things are connected to a database, or whatever then the whole thing catches on fire in spectacular ways
having everyone in the world tweeting about an incredible save, or a shitty call, or a ridiculous goal all at the same time means the traffic is super concentrated and super high. A lot of work has to go into preparing to keep things online and actively putting out fires while the events are ongoing. I've heard engineers from Instagram talk about how they always have a miserable New Years bc a lot of things always break with the increased load. And it's just not possible or practical to anticipate every potential failure mode.
Long ramble, but i'd expect it to be a slow collapse until it isn't. It can't stay online with a skeleton crew forever
deleted by creator
Yeah or full Disney movies
:nicholson-yes: