If you can accept, index and store a given volume of tweets, why should it be im...

sjsdaiuasgdia · on Sept 13, 2022

It's a much harder target to generate truly representative load. Sure, you could make millions of tweets very fast, but is that actually representing what the production service is dealing with?

Unstable connections. Traffic spikes that can be highly localized or nearly system wide. System maintenance, planned and reactive. Poorly behaved clients. Malicious traffic. Many more similar factors...and yes, each of these you could throw engineers at them and get solutions eventually. You might also end up with a testing engineering team that dwarfs the product engineering team.

Also, Twitter doesn't make money. $5B revenue vs $5.5B opex in 2021. They can't really afford to run the actual system, much less a fully representative one. Even a 1/10th replica, which would be unable to find some classes of problems no matter how many engineers you throw at, would further break the bank. As would the engineering staff to support it.

smt88 · on Sept 13, 2022

Twitter can't afford a 1:1 replica for sure, and it would be meaningless without globally distributed production load anyway.

A 1:10 replica is fine, but also astronomically expensive. What's better about it than 1:100? Or 1:10000?

Once you give up on 1:1 (which is reasonable), you're either testing in production or not really testing at all. You'll have many lines of code that will only apply to the 1:1 scale.