More

cletus · 2025-12-28T23:41:15 1766965275

My second project at Google basically killed mocking for me and I've basically never done it since. Two things happened.

The first was that I worked on a rewrite of something (using GWT no less; it was more than a decade ago) and they decided to have a lot of test coverage and test requirements. That's fine but they way it was mandated and implemented, everybody just testing their service and DIed a bunch of mocks in.

The results were entirely predictable. The entire system was incredibly brittle and a service that existed for only 8 weeks behaved like legacy code. You could spend half a day fixing mocks in tests for a 30 minute change just because you switched backend services, changed the order of calls or just ended up calling a given service more times than expected. It was horrible and a complete waste of time.

Even the DI aspect of this was horrible because everything used Guice andd there wer emodules that installed modules that installed modules and modifying those to return mocks in a test environment was a massive effort that typically resulted in having a different environment (and injector) for test code vs production code so what are you actually testing?

The second was that about this time the Java engineers at the company went on a massive boondoggle to decide on whether to use (and mandate) EasyMock vs Mockito. This was additionally a waste of time. Regardless of the relative merits of either, there's really not that much difference. At no point is it worth completely changing your mocking framework in existing code. Who knows how many engineering man-yars were wasted on this.

Mocking encourages bad habits and a false sense of security. The solution is to have dummy versions of services and interfaces that have minimal correct behavior. So you might have a dummy Identity service that does simple lookups on an ID for permissions or metadata. If that's not what you're testing and you just need it to run a test, doing that with a mock is just wrong on so many levels.

I've basically never used mocks since, so much so that I find anyone who is strongly in favor of mocks or has strong opinions on mocking frameworks to be a huge red flag.

throwaway7783 · 2025-12-29T02:52:41 1766976761

I'm not sure I understand. "The solution is to have dummy versions of services and interfaces that have minimal correct behavior".

That's mocks in a nutshell. What other way would you use mocks?

cletus · 2025-12-29T03:21:14 1766978474

Imagine you're testing a service to creates, queries and deletes users. A fake version of that service might just be a wrapper on a HashMap keyed by ID. It might have several fields like some personal info, a hashed password, an email address, whether you're verified and so on.

Imagine one of your tests is if the user deletes their account. What pattern of calls should it make? You don't really care other than the record being deleted (or marked as deleted, depending on retention policy) after you're done.

In the mock world you might mock out calls like deleteUserByID and make suer it's called.

In the fake world, you simply check that the user record is deleted (or marked as such) after the test. You don't really care about what sequence of calls made that happen.

That may sound trivial but it gets less trivial the more complex your example is. Imagine instead you want to clear out all users who are marked for deletion. If you think about the SQL for that you might do a DELETE ... WHERE call so your API call might look like that. But if the logic is more complicated? Where if there's a change where EU and NA users have different retention periods or logging requirements so they're suddenly handled differently?

In a mokcing world you would have to change all your expected mocks. In fact, implementing this change might require fixing a ton of tests you don't care about at all and aren't really being broken by the change regardless.

In a fake world, you're testing what the data looks like after you're done, not the specific steps it took to get there.

Now those are pretty simple examples because there's not much to do the arguments used and no return values to speak of. Your code might branch differently based on those values, which then changes what calls to expects and with what values.

You're testing implementation details in a really time-consuming yet brittle way.

throwaway7783 · 2025-12-29T05:05:19 1766984719

I am unsure I follow this. I'm generally mocking the things that are dependencies for the thing I'm really testing.

If the dependencies are proper interfaces, I don't care if it's a fake or a mock, as long as the interface is called with the correct parameters. Precisely because I don't want to test the implementation details. The assumption (correctly so) is that the interface provides a contract I can rely on.

In you example, the brittleness simply moves from mocks to data setup for the fake.

MrJohz · 2025-12-29T07:12:04 1766992324

The point is that you probably don't care that much how exactly the dependency is called, as long as it is called in such a way that it does the action you want and returns the results you're interested in. The test shouldn't be "which methods of the dependency does this function call?" but rather "does this function produce the right results, assuming the dependency works as expected?".

This is most obvious with complex interfaces where there are multiple ways to call the dependency that do the same thing. For example if my dependency was an SQL library, I could call it with a string such as `SELECT name, id FROM ...`, or `SELECT id, name FROM ...`. For the dependency itself, these two strings are essentially equivalent. They'll return results in a different order, but as long as the calling code parses those results in the right order, it doesn't matter which option I go for, at least as far as my tests are concerned.

So if I write a test that checks that the dependency was carried with `SELECT name, id FROM ...`, and later I decide that the code looks cleaner the other way around, then my test will break, even though the code still works. This is a bad test - tests should only fail if there is a bug and the code is not working as expected.

In practice, you probably aren't mocking SQL calls directly, but a lot of complex dependencies have this feature where there are multiple ways to skin a cat, but you're only interested in whether the cat got skinned. I had this most recently using websockets in Node - there are different ways of checking, say, the state of the socket, and you don't want to write tests that depend on a specific method because you might later choose a different method that is completely equivalent, and you don't want your tests to start failing because of that.

3uler · 2025-12-29T10:13:46 1767003226

The fakes vs mocks distinction here feels like a terminology debate masking violent agreement. What you’re describing as a “fake” is just a well-designed mock. The problem isn’t mocks as a concept, it’s mocking at the wrong layer. The rule: mock what you own, at the boundaries you control. The chaos you describe comes from mocking infrastructure directly. Verifying “deleteUserById was called exactly once with these params” is testing implementation, not behavior. Your HashMap-backed fake tests the right thing: is the user gone after the operation? Who cares how. The issue is finding the correct layers to validate behavior, not the implementation detail of mocks or fakes… that’s like complaining a hammer smashed a hole in the wall.

throwaway7783 · 2025-12-29T22:40:01 1767048001

In the SQL example, unless you actually use an SQL service as a fake, you cannot really quite get the fake do the right thing either. At which point, it's no longer a mock/fake test but an integration/DB test. Network servers are another such class and for most parts can be either mocked or faked using interface methods.

I would argue that (barring SQL), if there are too many ways to skin a cat, it is a design smell. Interfaces are contracts. Even for SQL, I almost end up using a repository method (findByXxx flavors) so it is very narrow in scope.

Jach · 2025-12-29T04:11:49 1766981509

The general term I prefer is test double. See https://martinfowler.com/bliki/TestDouble.html for how one might distinguish dummies, fakes, stubs, spies, and mocks.

Of course getting overly pedantic leads to its own issues, much like the distinctions between types of tests.

At my last Java job I used to commonly say things like "mocks are a smell", and avoided Mockito like GP, though it was occasionally useful. PowerMock was also sometimes used because it lets you get into the innards of anything without changing any code, but much more rarely. Ideally you don't need a test double at all.

phanimahesh · 2025-12-29T03:17:39 1766978259

There are different kinds of mocks.

Check function XYZ is called, return abc when XYZ is called etc are the bad kind that people were bit badly by.

The good kind are a minimally correct fake implementation that doesn't really need any mocking library to build.

Tests should not be brittle and rigidly restate the order of function calls and expected responses. That's a whole lot of ceremony that doesn't really add confidence in the code because it does not catch many classes of errors, and requires pointless updates to match the implementation 1-1 everytime it is updated. It's effectively just writing the implementation twice, if you squint at it a bit.

OrangeMusic · 2025-12-29T09:04:24 1766999064

The second way is usually referring to as "fakes", which are not a type of mocks but a (better) alternative to mocks.

bonesss · 2025-12-29T19:31:10 1767036670

In reflection heavy environments and with injection and reflection heavy frameworks the distinction is a bit more obvious and relevant (.Net, Java). In some cases the mock configuration blossoms to essentially parallel implementations, leading to the brittleness discussed earlier in the thread.

Technically creating a shim or stub object is mocking, but “faking” isn’t using a mocking framework to track incoming calls or internal behaviours. Done properly, IMO, you’re using inheritance and the opportunity through the TDD process to polish & refine the inheritance story and internal interface of key subsystems. Much like TDD helps design interfaces by giving you earlier external interface consumers, you also get early inheritors if you are, say, creating test services with fixed output.

In ideal implementations those stub or “fake” services answer the “given…” part of user stories leaving minimalistic focused tests. Delivering hardcoded dictionaries of test data built with appropriate helpers is minimal and easy to keep up to date, without undue extra work, and doing that kind of stub work often identifies early re-use needs/benefits in the code-base. The exact features needed to evolve the system as unexpected change requests roll in are there already, as QA/end-users are the systems second rodeo, not first.

The mocking antipatterns cluster around ORM misuse and tend to leak implementation details (leading to those brittle tests), and is often co-morbid with anemic domains and other cargo cult cruft. Needing intense mocking utility and frameworks on a system you own is a smell.

For corner cases and exhaustiveness I prefer to be able to do meaningful integration tests in memory as far as possible too (in conjunction with more comprehensive tests). Faster feedback means faster work.

throwaway7783 · 2025-12-29T05:27:51 1766986071

Why is check if XYZ is called with return value ABC bad, as long as XYZ is an interface method?

Why is a minimally correct fake any better than a mock in this context?

Mocks are not really about order of calls unless you are talking about different return values on different invocations. A fake simply moves the cheese to setting up data correctly, as your tests and logic change.

Not a huge difference either way.

rcxdude · 2025-12-29T08:09:33 1766995773

The point is to test against a model of the dependency, not just the expected behavour of the code under test. If you just write a mock that exactly corresponds to the test that you're running, you're not testing the interface with the underlying system, you're just running the (probably already perfectly understandable) unit through a rote set of steps, and that's both harder to maintain and less useful than testing against a model of the underlying system.

(And IMO this should only be done for heavyweight or difficult to precisely control components of the system where necessary to improve test runtime or expand the range of testable conditions. Always prefer testing as close to the real system as reasonably practical)

throwaway7783 · 2025-12-29T22:43:33 1767048213

But mocks are a model of the dependency. I don't quite see how a fake is a better model than a mock.

In any case, I agree testing close to a real system, with actual dependencies where possible is better. But that's not done with a fake.

rcxdude · 2025-12-29T22:55:23 1767048923

The kind of mocks the OP is arguing against are not really a model of the dependency, they're just a model of a particular execution sequence in the test, because the mock is just following a script. Nothing in it ensures that the sequence is even consistent with any given understanding of how the dependency works, and it will almost certainly need updating when the code under test is refactored.

throwaway7783 · 2025-12-31T17:55:50 1767203750

My point is that a fake doesn't magically fix this issue. Both are narrow models of the underlying interface. I don't still quite understand why a mock is worse than a fake, when it comes to narrow models of the interface. If there is a method that needs to be called with a specific set up, there is no practical difference between a fake and a mock.

Again, none of this is a replacement for writing integration tests where possible. Mocks have a place in the testing realm and they are not an inherently bad tool.

ahepp · 2025-12-29T04:21:34 1766982094

Mocking is testing how an interface is used, rather than testing an implementation. That's why it requires some kind of library support. Otherwise you'd just on the hook for providing your own simple implementations of your dependencies.

yearolinuxdsktp · 2025-12-29T04:41:59 1766983319

Heavy mocks usage comes from dogmatically following the flawed “most tests should be unit tests” prescription of the “testing pyramid,” as well as a strict adherence to not testing more than one class at a time. This necessitates heavy mocking, which is fragile, terrible to refactor, leads to lots of low-value tests. Sadly, AI these days will generate tons of those unit tests in the hands of those who don’t know better. All in all leading to the same false sense of security and killing development speed.

fatso83 · 2025-12-31T10:36:44 1767177404

I get what you are saying, but you can have your cake and eat it too. Fast, comprehensive tests that cover most of your codebase. Test through the domain, employ Fakes at the boundaries.

https://asgaut.com/use-of-fakes-for-domain-driven-design-and...

LgWoodenBadger · 2025-12-28T23:48:31 1766965711

“The solution is to have dummy versions of services and interfaces that have minimal correct behavior”

If you aren’t doing this with mocks then you’re doing mocks wrong.

bccdee · 2025-12-29T04:10:51 1766981451

Martin Fowler draws a useful distinction between mocks, fakes, and stubs¹. Fakes contain some amount of internal logic, e.g. a remote key-value store can be faked with a hashmap. Stubs are a bit dumber—they have no internal logic & just return pre-defined values. Mocks, though, are rigged to assert that certain calls were made with certain parameters. You write something like `myMock.Expect("sum").Args(1, 2).Returns(3)`, and then when you call `myMock.AssertExpectations()`, the test fails unless you called `myMock.sum(1, 2)` somewhere.

People often use the word "mock" to describe all of these things interchangeably², and mocking frameworks can be useful for writing stubs or fakes. However, I think it's important to distinguish between them, because tests that use mocks (as distinct from stubs and fakes) are tightly coupled to implementation, which makes them very fragile. Stubs are fine, and fakes are fine when stubs aren't enough, but mocks are just a bad idea.

[1]: https://martinfowler.com/articles/mocksArentStubs.html

[2]: The generic term Fowler prefers is "test double."

cowsandmilk · 2025-12-29T01:29:42 1766971782

In part, you’re right, but there’s a practical difference between mocking and a good dummy version of a service. Take DynamoDB local as an example: you can insert items and they persist, delete items, delete tables, etc. Or in the Ruby on Rails world, one often would use SQLite as a local database for tests even if using a different DB in production.

Going further, there’s the whole test containers movement of having a real version of your dependency present for your tests. Of course, in a microservices world, bringing up the whole network of dependencies is extremely complicated and likely not warranted.

sfn42 · 2025-12-29T02:15:44 1766974544

I use test containers and similar methods to test against a "real" db, but I also use mocks. For example to mock the response of a third party api, can't very well spin that up in a test container. Nother example is simply time stamps. Can't really test time related stuff without mocking a timestamp provider.

It is a hassle a lot of the time, but I see it as a necessary evil.

supriyo-biswas · 2025-12-29T03:16:43 1766978203

You can use a library like [1] to mock out a real HTTP server with responses.

[1] https://www.mock-server.com/

pdpi · 2025-12-29T01:36:24 1766972184

I'd go a bit farther — "mock" is basically the name for those dummy versions.

That said, there is a massive difference between writing mocks and using a mocking library like Mockito — just like there is a difference between using dependency injection and building your application around a DI framework.

rgoulter · 2025-12-29T01:57:10 1766973430

> there is a massive difference between writing mocks and using a mocking library like Mockito

How to reconcile the differences in this discussion?

The comment at the root of the thread said "my experience with mocks is they were over-specified and lead to fragile services, even for fresh codebases. Using a 'fake' version of the service is better". The reply then said "if mocking doesn't provide a fake, it's not 'mocking'".

I'm wary of blanket sentiments like "if you ended up with a bad result, you weren't mocking". -- Is it the case that libraries like mockito are mostly used badly, but that correct use of them provides a good way of implementing robust 'fake services'?

pdpi · 2025-12-29T02:12:02 1766974322

In my opinion, we do mocking the exact opposite of how we should be doing it — Mocks shouldn't be written by the person writing tests, but rather by the people who implemented the service being mocked. It's exceedingly rare to see this pattern in the wild (and, frustratingly, I can't think of an example off the top of my head), but I know Ive had good experiences with cases of package `foo` offering a `foo-testing` package that offers mocks. Turns out that mocks are a lot more robust when they're built on top of the same internals as the production version, and doing it that way also obviates much of the need for general-purpose mocking libraries.

saghm · 2025-12-29T00:24:52 1766967892

I think the argument they're making is that once you have this, you already have an easy way to test things that doesn't require bringing in an entire framework.

jchw · 2025-12-29T01:45:12 1766972712

The difference, IMO, between a mock and a proper "test" implementation is that traditionally a mock only exists to test interface boundaries, and the "implementation" is meant to be as much of a noop as possible. That's why the default behavior of almost any "automock" is to implement an interface by doing nothing and returning nothing (or perhaps default-initialized values) and provide tools for just tacking assertions onto it. If it was a proper implementation that just happened to be in-memory, it wouldn't really be a "mock", in my opinion.

For example, let's say you want to test that some handler is properly adding data to a cache. IMO the traditional mock approach that is supported by mocking libraries is to go take your RedisCache implementation and create a dummy that does nothing, then add assertions that say, the `set` method gets called with some set of arguments. You can add return values to the mock too, but I think this is mainly meant to be in service of just making the code run and not actually implementing anything.

Meanwhile, you could always make a minimal "test" implementation (I think these are sometimes called "fakes", traditionally, though I think this nomenclature is even more confusing) of your Cache interface that actually does behave like an in-memory cache, then your test could assert as to its contents. Doing this doesn't require a "mocking" library, and in this case, what you're making is not really a "mock" - it is, in fact, a full implementation of the interface, that you could use outside of tests (e.g. in a development server.) I think this can be a pretty good middle ground in some scenarios, especially since it plays along well with in-process tools like fake clocks/timers in languages like Go and JavaScript.

Despite the pitfalls, I mostly prefer to just use the actual implementations where possible, and for this I like testcontainers. Most webserver projects I write/work on naturally require a container runtime for development for other reasons, and testcontainers is glue that can use that existing container runtime setup (be it Docker or Podman) to pretty rapidly bootstrap test or dev service dependencies on-demand. With a little bit of manual effort, you can make it so that your normal test runner (e.g. `go test ./...`) can run tests normally, and automatically skip anything that requires a real service dependency in the event that there is no Docker socket available. (Though obviously, in a real setup, you'd also want a way to force the tests to be enabled, so that you can hopefully avoid an oopsie where CI isn't actually running your tests due to a regression.)

zem · 2025-12-29T03:03:34 1766977414

my time at google likewise led me to the conclusion that fakes were better than mocks in pretty much every case (though I was working in c++ and python, not java).

edit: of course google was an unusual case because you had access to all the source code. I daresay there are cases where only a mock will work because you can't satisfy type signatures with a fake.

rurban · 2025-12-31T05:17:24 1767158244

I dont use dummy services and I dont use mocking. I'm writing simulators to test things for HW or big services which are not available for testing.

Simulators need to be complete for their use cases or they cannot be used for testing.

ebiester · 2025-12-29T02:33:40 1766975620

Mockito, in every case I had to use it, was a last resort because a third party library didnt lend itself to mocking, or you were bringing legacy code under test and using it long enough to refactor it out.

It should never be the first tool. But when you need it, it’s very useful.

cletus · 2025-12-25T13:52:30 1766670750

Story time. This has basically nothing to do with this post other than it involves a limit of 10,000 but hey, it's Christmas and I want to tell a story.

I used to work for Facebook and many years ago people noticed you couldn't block certain people but the one that was most public was Mark Zuckerberg. It would just say it failed or something like that. And people would assign malice or just intent to it. But the truth was much funnier.

Most data on Facebook is stored in a custom graph database that basically only has 2 tables that are sharded across thousands of MySQL instances but most almost always accessed via an in-memory write-through cache, also custom. It's not quite a cache because it has functionality built on top of the database that accessing directly wouldn't have.

So a person is an object and following them is an edge. Importantly, many such edges were one-way so it was easy to query if person A followed B but much more difficult to query all the followers of B. This was by design to avoid hot shards.

So I lied when I said there were 2 tables. There was a third that was an optimization that counted certain edges. So if you see "10.7M people follow X" or "136K people like this", it's reading a count, not doing a query.

Now there was another optimization here: only the last 10,000 of (object ID,edge type) were in memory. You generally wanted to avoid dealing with anything older than that because you'd start hitting the database and that was generally a huge problem on a large, live query or update. As an example, it was easy to query the last 10,000 people or pages you've followed.

You should be able to see where this is going. All that had happened was 10,000 people had blocked Mark Zuckerberg. Blocks were another kind of edge that was bidirectional (IIRC). The system just wasn't designed for a situation where more than 10,000 people wanted to block someone.

This got fixed many years ago because somebody came along and build a separate system to handle blocking that didn't have the 10,000 limit. I don't know the implementation details but I can guess. There was a separate piece of reverse-indexing infrastructure for doing queries on one-way edges. I suspect that was used.

Anyway, I love this story because it's funny how a series of technical decisions can lead to behavior and a perception nobody intended.

Zacharias030 · 2025-12-25T18:46:45 1766688405

Merry Christmas! This is why I like hackernews.

cletus · 2025-07-11T13:59:43 1752242383

People should go to jail for this.

Anyone who has worked on a large migration eventually lands on a pattern that goes something like this:

1. Double-write to the old system and the new system. Nothing uses the new system;

2. Verify the output in the new system vs the old system with appropriate scripts. If there are issues, which there will be for awhile, go back to (1);

3. Start reading from the new system with a small group of users and then an increasingly large group. Still use the old system as the source of truth. Log whenever the output differs. Keep making changes until it always matches;

4. Once you're at 100% rollout you can start decomissioning the old system.

This approach is incremental, verifiable and reversible. You need all of these things. If you engage in a massive rewrite in a silo for a year or two you're going to have a bad time. If you have no way of verifying your new system's output, you're going to have a bad time. In fact, people are going to die, as is the case here.

If you're going to accuse someone of a criminal act, a system just saying it happened should NEVER be sufficient. It should be able to show its work. The person or people who are ultimately responsible for turning a fraud detection into a criminal complaint should themselves be criminally liable if they make a false complaint.

We had a famous example of this with Hertz mistakenly reporting cars stolen, something they ultimately had to pay for in a lawsuit [1] but that's woefully insufficient. It is expensive, stressful and time-consuming to have to criminally defend yourself against a felony charge. People will often be forced to take a plea because absolutely everything is stacked in the prosecution's favor despite the theoretical presumption of innocence.

As such, an erroneous or false criminal complaint by a company should itself be a criminal charge.

In Hertz's case, a human should eyeball the alleged theft and look for records like "do we have the car?", "do we know where it is?" and "is there a record of them checking it in?"

In the UK post office scandal, a detection of fraud from accounting records should be verified by comparison to the existing system in a transition period AND, moreso in the beginning, double checking results with forensic accountants (actual humans) before any criminal complaint is filed.

[1]: https://www.npr.org/2022/12/06/1140998674/hertz-false-accusa...

cletus · 2025-06-14T15:21:38 1749914498

I realize scale makes everything more difficult but at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard. There are a few statements in this that gave me pause.

The core problem seems to be development in isolation. Put another way: microservices. This post hints at microservices having complete autonomy over their data storage and developing their own GraphQL models. The first is normal for microservices (but an indictment at the same time). The second is... weird.

The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie". Attributes are optional. Pull what you need. Common subsets of data can be organized in fragments. If you're not doing that, why are you using GraphQL?

So I worked at Facebook and may be a bit biased here because I encountered a couple of ex-Netflix engineers in my time who basically wanted to throw away FB's internal infrastructure and reinvent Netflix microservices.

Anyway, at FB there a Video GraphQL object. There aren't 23 or 7 or even 2.

Data storage for most things was via write-through in-memory graph database called TAO that persisted things to sharded MySQL servers. On top of this, you'd use EntQL to add a bunch of behavior to TAO like permissions, privacy policies, observers and such. And again, there was one Video entity. There were offline data pipelines that would generally process logging data (ie outside TAO).

Maybe someone more experienced with microservices can speak to this: does UDA make sense? Is it solving an actual problem? Or just a self-created problem?

jmull · 2025-06-14T17:05:20 1749920720

I think they are just trying to put in place the common data model that, as you point out, they need.

(So their micro services can work together usefully and efficiently -- I would guess that currently the communication burden between microservice teams is high and still is not that effective.)

> The whole point of GraphQL is to create a unified view of something

It can do that, but that's not really the point of GraphQL.. I suppose you're saying that's how it was used as FB. That's fine, IMO, but it sounds like this NF team decided to use something more abstract for the same purpose.

I can't comment on their choices without doing a bunch more analysis, but in my own experience I've found off-the-shelf data modeling formats have too much flexibility in some places (forcing you to add additional custom controls or require certain usage patterns) and not enough in others (forcing you to add custom extensions). The nice thing about your own format is you can make it able to express everything you want and nothing you don't. And have a well-defined projection to Graphql (and sqlite and oracle and protobufs and xml and/or whatever other thing you're using).

twodave · 2025-06-14T15:26:41 1749914801

I totally agree. Especially with Fusion it’s very easy to establish core types in self-contained subgraphs and then extend those types in domain-specific subgraphs. IMO the hardest part about this approach is just namespacing all the things, because GraphQL doesn’t have any real conventions for organizing service- (or product-) specific types.

bertails · 2025-06-14T17:43:10 1749922990

> The whole point of GraphQL is to create a unified view of something, not to have 23 different versions of "Movie".

GraphQL is great at federating APIs, and is a standardized API protocol. It is not a data modeling language. We actually tried really hard with GraphQL first.

cush · 2025-06-14T15:35:39 1749915339

>at the end of the day, Netflix is encoding and serving several thousand videos via a CDN. It can't be this hard

Yeah maybe 10 years ago, but today Netflix is one of the top production companies on the planet. In the article, they even point to how this addresses their issues in content engineering

https://netflixtechblog.com/netflix-studio-engineering-overv...

https://netflixtechblog.com/globalizing-productions-with-net...

cletus · 2025-05-13T16:32:38 1747153958

So I've worked for Google (and Facebook) and it really drives the point home of just how cheap hardware is and how not worth it optimizing code is most of the time.

More than a decade ago Google had to start managing their resource usage in data centers. Every project has a budget. CPU cores, hard disk space, flash storage, hard disk spindles, memory, etc. And these are generally convertible to each other so you can see the relative cost.

Fun fact: even though at the time flash storage was ~20x the cost of hard disk storage, it was often cheaper net because of the spindle bottleneck.

Anyway, all of these things can be turned into software engineer hours, often called "mili-SWEs" meaning a thousandth of the effort of 1 SWE for 1 year. So projects could save on hardware and hire more people or hire fewer people but get more hardware within their current budgets.

I don't remember the exact number of CPU cores amounted to a single SWE but IIRC it was in the thousands. So if you spend 1 SWE year working on optimization acrosss your project and you're not saving 5000 CPU cores, it's a net loss.

Some projects were incredibly large and used much more than that so optimization made sense. But so often it didn't, particularly when whatever code you wrote would probably get replaced at some point anyway.

The other side of this is that there is (IMHO) a general usability problem with the Web in that it simply shouldn't take the resources it does. If you know people who had to or still do data entry for their jobs, you'll know that the mouse is pretty inefficient. The old terminals from 30-40+ years ago that were text-based had some incredibly efficent interfaces at a tiny fraction of the resource usage.

I had expected that at some point the Web would be "solved" in the sense that there'd be a generally expected technology stack and we'd move on to other problems but it simply hasn't happened. There's still a "framework of the week" and we're still doing dumb things like reimplementing scroll bars in user code that don't work right with the mouse wheel.

I don't know how to solve that problem or even if it will ever be "solved".

mike_hearn · 2025-05-13T19:34:32 1747164872

I worked there too and you're talking about performance in terms of optimal usage of CPU on a per-project basis.

Google DID put a ton of effort into two other aspects of performance: latency, and overall machine utilization. Both of these were top-down directives that absorbed a lot of time and attention from thousands of engineers. The salary costs were huge. But, if you're machine constrained you really don't want a lot of cores idling for no reason even if they're individually cheap (because the opportunity cost of waiting on new DC builds is high). And if your usage is very sensitive to latency then it makes sense to shave milliseconds off because of business metrics, not hardware $ savings.

cletus · 2025-05-13T20:32:19 1747168339

The key part here is "machine utilization" and absolutely there was a ton of effort put into this. I think before my time servers were allocated to projects but even early on in my time at Google Borg had already adopted shared machine usage and therew was a whole system of resource quota implemented via cgroups.

Likewise there have been many optimization projects and they used to call these out at TGIF. No idea if they still do. One I remember was reducing the health checks via UDP for Stubby and given that every single Google product extensively uses Stubby then even a small (5%? I forget) reduction in UDP traffic amounted to 50,000+ cores, which is (and was) absolutely worth doing.

I wouldn't even put latency in the same category as "performance optimization" because often you decrease latency by increasing resource usage. For example, you may send duplicate RPCs and wait for the fastest to reply. That could be double or tripling effort.

xondono · 2025-05-13T17:09:00 1747156140

Except you’re self selecting for a company that has high engineering costs, big fat margins to accommodate expenses like additional hardware, and lots of projects for engineers to work on.

The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.

The problem is that almost no one is doing it, because the way we make these decisions has nothing to do with the economical calculus behind, most people just do “what Google does”, which explains a lot of the disfunction.

bjourne · 2025-05-13T18:30:01 1747161001

I think the parent's point is that if Google with millions of servers can't make performance optimization worthwhile, then it is very unlikely that a smaller company can. If salaries dominate over compute costs, then minimizing the latter at the expense of the former is counterproductive.

> The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.

That's debatable. Performance optimization almost always lead to complexity increase. Doubled performance can easily cause quadrupled complexity. Then one has to consider whether the maintenance burden is worth the extra performance.

makeitdouble · 2025-05-14T07:08:18 1747206498

> it is very unlikely that a smaller company can.

I think it's the reverse: a small company doesn't have the liquidity, buying power or ability to convert more resource into more money like Google.

And of course a lot of small companies will be paying Google with a fat margin to use their cloud.

Getting by with less resources, or even on-premise reduced hardware will be a way bigger win. That's why they'll pay a DBA full time to optimize their database needs to reduce costs 2 to 3x the salary. Or have full team of infra guys mostly dealing with SRE and performance.

maccard · 2025-05-13T19:16:47 1747163807

> If salaries dominate over compute costs, then minimizing the latter at the expense of the former is counterproductive.

And with client side software, compute costs approach 0 (as the company isn’t paying for it).

arp242 · 2025-05-13T21:41:55 1747172515

> I don't remember the exact number of CPU cores amounted to a single SWE but IIRC it was in the thousands.

I think this probably holds true for outfits like Google because 1) on their scale "a core" is much cheaper than average, and 2) their salaries are much higher than average. But for your average business, even large businesses? A lot less so.

I think this is a classic "Facebook/Google/Netflix/etc. are in a class of their own and almost none of their practices will work for you"-type thing.

morepork · 2025-05-13T22:12:59 1747174379

Maybe not to the same extent, but an AWS EC2 m5.large VM with 2 cores and 8 GB RAM costs ~$500/year (1 year reserved). Even if your engineers are being paid $50k/year, that's the same as 100 VMs or 200 cores + 800 GB RAM.

smikhanov · 2025-05-14T11:08:30 1747220910

    I don't know how to solve that problem or even if it will ever be "solved".

It will not be “solved” because it’s a non-problem.

You can run a thought experiment imagining an alternative universe where human resource were directed towards optimization, and that alternative universe would look nothing like ours. One extra engineer working on optimization means one less engineer working on features. For what exactly? To save some CPU cycles? Don’t make me laugh.

karmakaze · 2025-05-13T22:54:44 1747176884

Google doesn't come up with better compression and binary serialization formats just for fun--it improves their bottom line.

cletus · 2025-05-11T04:16:13 1746936973

Google has over the years tried to get several new languages off the ground. Go is by far the most successful.

What I find fascinating is that all of them that come to mind were conceived by people who didn't really understand the space they were operating in and/or had no clear idea of what problem the language solved.

There was Dart, which was originally intended to be shipped as a VM in Chrome until the Chrome team said no.

But Go was originally designed as a systems programming language. There's a lot of historical revisionism around this now but I guarantee you it was. And what's surprising about that is that having GC makes that an immediate non-starter. Yet it happened anyway.

The other big surprise for me was that Go launched without external dependencies as a first-class citizen of the Go ecosystem. For the longest time there were two methods of declaring them: either with URLs (usually Github) in the import statements or with badly supported manifests. Like just copy what Maven did for Java. Not the bloated XML of course.

But Go has done many things right like having a fairly simple (and thus fast to compile) syntax, shipping with gofmt from the start and favoring error return types over exceptions, even though it's kind of verbose (and Rust's matching is IMHO superior).

Channels were a nice idea but I've become convinced that cooperative async-await is a superior programming model.

Anyway, Go never became the C replacement the team set out to make. If anything, it's a better Python in many ways.

Good luck to Ian in whatever comes next. I certainly understand the issues he faced, which is essentially managing political infighting and fiefdoms.

Disclaimer: Xoogler.

pjmlp · 2025-05-11T14:46:51 1746974811

Some of us believe GC[0] isn't an impediment for systems programming languages.

They haven't taken off as Xerox PARC, ETHZ, Dec Olivetti, Compaq, Microsoft desired more due to politics, external or internal (in MS's case), than technical impediments.

Hence why I like the way Swift and Java/Kotlin[1] are pushed on mobile OSes, to the point "my way or get out".

I might discuss about many of Go's decisions regarding minimalism language design, however I will gladly advocate for its suitability as systems language.

The kind of systems we used to program for a few decades ago, compilers, linkers, runtimes, drivers, OS services, bare metal deployments (see TamaGo),...

[0] - Any form of GC, as per computer science definition, not street knowledge.

[1] - The NDK is relatively constrained, and nowadays there is Kotlin Native as well.

eikenberry · 2025-05-11T04:37:41 1746938261

> Channels were a nice idea but I've become convinced that cooperative async-await is a superior programming model.

Curious as to your reasoning around this? I've never heard this opinion before from someone not biased by their programming language preferences.

cletus · 2025-05-11T04:58:08 1746939488

Sure. First you need to separate buffered and unbuffered channels.

Unbuffered channels basically operate like cooperate async/await but without the explictness. In cooperative multitasking, putting something on an unbuffered channel is essentially a yield().

An awful lot of day-to-day programming is servicing requests. That could be HTTP, an RPC (eg gRPC, Thrift) or otherwise. For this kind of model IMHO you almost never want to be dealing with thread primitives in application code. It's a recipe for disaster. It's so easy to make mistakes. Plus, you often need to make expensive calls of your own (eg reading from or writing to a data store of some kind) so there's no really a performance benefit.

That's what makes cooperative async/await so good for application code. The system should provide compatible APIs for doing network requests (etc). You never have to worry about out-of-order processing, mutexes, thread pool starvation or a million other issues.

Which brings me to the more complicated case of buffered channels. IME buffered channels are almost always a premature optimization that is often hiding concurrency issues. As in if that buffered channels fills up you may deadlock where you otherwise wouldn't if the buffer wasn't full. That can be hard to test for or find until it happens in production.

But let's revisit why you're optimizing this with a buffered channel. It's rare that you're CPU-bound. If the channel consumer talks to the network any perceived benefit of concurrency is automatically gone.

So async/await doesn't allow you to buffer and create bugs for little benefit and otherwise acts like unbuffered channels. That's why I think it's a superior programming model for most applications.

yubblegum · 2025-05-11T14:52:50 1746975170

Buffers are there to deal with flow variances. What you are describing as the "ideal system" is a clockwork. Your async-awaits are meshed gears. For this approach to be "ideal" it needs to be able to uniformly handle the dynamic range of the load on the system. This means every part of the clockwork requires the same performance envelope. (a little wheel is spinning so fast that it causes metal fatigue; a flow hits the performance ceiling of an intermediary component). So it either fails or limits the system's cyclical rate. These 'speed bumps' are (because of the clockwork approach) felt throughout the flow. That is why we put buffers in between two active components. Now we have a greater dynamic range window of operation without speed bumps.

It shouldn't be too difficult to address testing of buffered systems at implementation time. Possibly pragma/compile-time capabilities allowing for injecting 'delay' in the sink side to trivially create "full buffer" conditions and test for it.

There are no golden hammers because the problem domain is not as simple as a nail. Tradeoffs and considerations. I don't think I will ever ditch either (shallow, preferred) buffers or channels. They have their use.

jpc0 · 2025-05-11T07:34:14 1746948854

I agree with many of your points, including coroutines being a good abstraction.

The reality is though that you are directly fighting or reimplementing the OS scheduler.

I haven’t found an abstraction that does exactly what I want but unfortunately any sort of structured concurrency tends to end up with coloured functions.

Something like C++ stdexec seems interesting but there are still elements of function colouring in there if you need to deal with async. The advantage is that you can compose coroutines and synchronous code.

For me I want a solution where I don’t need to care whether a function is running on the async event loop, a separate thread, a coprocessor or even a different computer and the actor/CSP model tends to model that the best way. Coroutines are an implementation detail and shouldn’t be exposed in an API but that is a strong opinion.

frollogaston · 2025-05-12T17:30:01 1747071001

As you probably know, Rust ended up with async/await. This video goes deep into that and the alternatives, and it changed my opinions a bit: https://www.youtube.com/watch?v=lJ3NC-R3gSI

Golang differs from Rust by having a runtime underneath. If you're already paying for that, it's probably better to do greenthreading than async/await, which is what Go did. I still find the Go syntax for this more bothersome and error-prone, as you said, but there are other solutions to that.

eikenberry · 2025-05-12T04:53:59 1747025639

I can see the appeal for simplicity of concept and not requiring any runtime, but it has some hard tradeoffs. In particular the ones around colored functions and how that makes it feel like concurrency was sort of tacked onto the languages that use it. Being cooperative adds a performance cost as well which I'm not sure I'd be on board with.

skybrian · 2025-05-11T14:22:11 1746973331

“Systems programming language” is an ambiguous term and for some definitions (like, a server process that handles lots of network requests) garbage collection can be ok, if latency is acceptable.

Google has lots of processes handling protobuf requests written in both Java and C++. (Or at least, it did at the time I was there. I don’t think Go ever got out of third place?)

frollogaston · 2025-05-12T17:32:02 1747071122

It's non-application software meant to support something else at run time. Like a cache, DBMS, webserver, runtime, OS, etc.

kmeisthax · 2025-05-11T15:24:16 1746977056

My working definition of "systems programming" is "programming software that controls the workings of other software". So kernels, hypervisors, emulators, interpreters, and compilers. "Meta" stuff. Any other software that "lives inside" a systems program will take on the performance characteristics of its host, so you need to provide predictable and low overhead.

GC[0] works for servers because network latency will dominate allocation latency; so you might as well use a heap scanner. But I wouldn't ever want to use GC in, say, audio workloads; where allocation latency is such a threat that even malloc/free has to be isolated into a separate thread so that it can't block sample generation. And that also means anything that audio code lives in has to not use GC. So your audio code needs to be written in a systems language, too; and nobody is going to want an OS kernel that locks up during near-OOM to go scrub many GBs of RAM.

[0] Specifically, heap-scanning deallocators, automatic refcount is a different animal.

skybrian · 2025-05-11T15:39:05 1746977945

I wouldn’t include compilers in that list. A traditional compiler is a batch process that needs to be fast enough, but isn’t particularly latency sensitive; garbage collection is fine. Compilers can and are written in high-level languages like Haskell.

Interpreters are a whole different thing. Go is pretty terrible for writing a fast interpreter since you can’t do low-level unsafe stuff like NaN boxing. It’s okay if performance isn’t critical.

pjmlp · 2025-05-11T16:52:51 1746982371

Yes, you can via unsafe.

And if you consider K&R C a systems language, you would do it like back in the day, with a bit of hand written helper functions in Assembly.

kmeisthax · 2025-05-11T17:24:52 1746984292

You don't (usually) inherit the performance characteristics of your compiler, but you do inherit the performance characteristics of the language your compiler implements.

pjmlp · 2025-05-11T16:51:09 1746982269

So that fits, given that Go compiler, linker, assembler and related runtime are all written in Go itself.

You mean an OS kernel, like Java Real Time running bare metal, designed in a way that it can even tackle battleship weapons targeting systems?

https://www.ptc.com/en/products/developer-tools/perc

nmz · 2025-05-11T15:26:43 1746977203

From what I remember, Go started out because a C++ application took 30 minutes compiling even though they were using google infrastructure, you could say that they set out to create a systems programming language (they certainly thought so), but mostly I think the real goal was recreating C++ features without the compile time, and in that, they were successful.

pluto_modadic · 2025-05-11T19:58:59 1746993539

is there a language that implements cooperative async-await patterns nicely?

frollogaston · 2025-05-12T18:49:14 1747075754

JS, Rust

zelphirkalt · 2025-05-11T16:01:58 1746979318

I mean, they claimed that one didn't need generics in the language for some 12 years or so ...

cletus · 2025-04-08T05:42:41 1744090961

A lot of the time, a lack of bugfixes comes from the incentive structure management has created. Specifically, you rarely get rewarded for fixing things. You get rewarded for shipping new things. In effect, you're punished for fixing things because that's time you're not shipping new things.

Ownership is another one. For example, product teams who are responsible for shipping new things but support for existing things get increasingly pushed onto support teams. This is really a consequence of the same incentive structure.

This is partially why I don't think that all subscription software is bad. The Adobe end of the spectrum is bad. The Jetbrains end is good. There is value in creating good, reliable software. If your only source of revenue is new sales then bugs are even less of a priority until it's so bad it makes your software virtually unusuable. And usually it took a long while to get there with many ignored warnings.

conradfr · 2025-04-08T06:54:42 1744095282

Jetbrains still likes to gaslight you and say you are wrong about bugs or features.

Recent example the removal of the commit modal.

switch007 · 2025-04-08T07:20:34 1744096834

The whole New UI debacle really set the tone and expectations and I don't see them changing. They seem like a different company these days? Maybe I didn't really notice in the past.

conradfr · 2025-04-08T11:49:12 1744112952

For me it was when they were copying Adobe UIs and removed colors from icons because "it was distracting".

Nowadays they copy Vs code instead.

SkyPuncher · 2025-04-09T15:42:45 1744213365

JetBrains is dead within 5 years unless they can get their AI game figured out (which they’re not).

Don’t get me wrong, I love JetBrains products. However, there value has been almost exclusively in QoL for dev. AI is drastically cutting the need for that.

kg · 2025-04-08T08:33:24 1744101204

The jetbrains model is every new release fixes that one critical bug that's killing you, and adds 2 new critical bugs that will drive you mad. I eventually got fed up and jumped off that train.

SkyPuncher · 2025-04-09T15:44:33 1744213473

Hmm, I’ve pretty much never experienced a bug in JetBrains products.

They’re one of the few products that just amazes me with how robust it is. Often, it will tell me I have issues before I even know about them (e.g my runtime is incorrect) and offer 1-click fixes.

sfn42 · 2025-04-08T08:58:23 1744102703

Not really sure what you guys are talking about. I've been using Rider for years and it's been great. I'm using the new UI and I have no problems with commits or anything else.

Recently joined a new team where I have to use VS because we have to work through a remote desktop where I can't install new stuff without a lengthy process, and having used VS for a while now it's so much worse. I miss Rider practically every second I'm writing code. There is nothing that I need that VS does better, it's either the same or usually worse for everything I do.

I hope I'll get a bit more used to it over time but so far I hate it. Feels like it's significantly reducing my velocity compared to Rider.

homebrewer · 2025-04-08T17:45:32 1744134332

Where to? There's nothing even remotely comparable for many tech stacks. I've been looking for alternatives for many years (also being fed up with their disregard for bugs and performance), but there are none (expect for proper VS for Windows-first C++/C#).

kg · 2025-04-08T18:45:01 1744137901

Sadly, I just accepted having worse productivity. I didn't really have a choice, their bugs were actively breaking my workflow, like causing builds to fail. It definitely made me more frustrated and less productive on a day-to-day basis.

pjmlp · 2025-04-09T09:11:06 1744189866

Eclipse and Netbeans for Java, QtCreator for C and C++ cross-platform, and VS if on Windows.

If it really must be, VSCode for everything else.

I never was a JetBrains fan, especially given the Android Studio experience, glad that is no longer a concern.

bolster8505 · 2025-04-09T13:50:06 1744206606

Netbeans is not for real development. Sorry, I love Netbeans. I grew up using it. It just doesn't have good support for real world Java development. As for Eclipse, I'll use notepad over that any day. I've been programming in Java since highschool, 20+ years ago.

IntelliJ is the best there is for Java, warts and all.

pjmlp · 2025-04-09T13:51:37 1744206697

How do you do real world JNI development with IntelliJ, including cross language debugging and profiling?

Quite curious of the answer in such great IDE.

Aeolun · 2025-04-09T10:32:42 1744194762

I just accepted I wasn’t going to find anything comparable, and just have to bite the bullet and accept software that has way less features, but at least consistently works, and doesn’t randomly decide to run at 800% CPU when a single file changes.

Now on team Zed. We’ll see how long that is good before it enshittifies too. I’m not sure if I should be happy they’re still not charging me for it.

stickfigure · 2025-04-08T20:59:40 1744145980

To be fair it seems to average 1:1 with some surge and recede.

hanikesn · 2025-04-08T07:51:30 1744098690

What when is this going to be finally removed? I'm still reverting back to the old dialog on every machine.

conradfr · 2025-04-08T11:10:39 1744110639

In the next release currently in beta, but they relented to move it to an unsupported plugin. Not sure if the idea.properties setting which still works will be removed.

https://youtrack.jetbrains.com/issue/IJPL-177161/Modal-commi...

hulitu · 2025-04-11T06:16:43 1744352203

> Jetbrains still likes to gaslight you and say you are wrong about bugs or features.

They learned from the best: Microsoft.

Microsoft cannot fix bugs because it's "engineers" are busy rounding corners in UI elements.

tjoff · 2025-04-08T07:47:11 1744098431

... but support for existing things get increasingly pushed onto support teams.

And support teams don't fix bugs?

Ygg2 · 2025-04-08T07:52:10 1744098730

You're removing autonomy from the support team, this will demoralize them.

The issue becomes, you have two teams, one moving fast, adding new features, often nonsensical to the support team, and the second one cleaning up afterward. Being in clean-up crew ain't fun at all.

This builds up resentment, i.e. "Why are they doing this?".

EDIT: If you make it so support team approval is necessary for feature team, you'll remove autonomy from feature team, causing resentment in their ranks (i.e. "Why are they slowing us down? We need this to hit our KPIs!").

citrin_ru · 2025-04-08T09:13:51 1744103631

On top of that support team often undeerstaffed and overloaded while feature pushers get more positions.

yencabulator · 2025-04-09T16:05:59 1744214759

Some 20+ years ago we solved this by leapfrogging.

  Team A does majority of new features in major release N.
  Team B for N+1.
  Team A for N+2.

  Team A maintains N until N+1 ships.
  Team B maintains N+1 until N+2 ships.

imtringued · 2025-04-08T08:39:27 1744101567

[flagged]

zombot · 2025-04-08T09:21:25 1744104085

The grammar of your own comment isn't any better.

cletus · 2025-03-04T03:48:29 1741060109

Xoogler here. I never worked on Fuchsia (or Android) but I knew a bunch of people who did and in other ways I was kinda adjacent to them and platforms in general.

Some have suggested Fuchsia was never intended to replace Android. That's either a much later pivot (after I left Google) or it's historical revisionism. It absolutely was intended to replace Android and a bunch of ex-Android people were involved with it from the start. The basic premise was:

1. Linux's driver situation for Android is fundamentally broken and (in the opinion of the Fuchsia team) cannot be fixed. Windows, for example, spent a lot of time on this issue to isolate issues within drivers to avoid kernel panics. Also, Microsoft created a relatively stable ABI for drivers. Linux doesn't do that. The process of upstreaming drivers is tedious and (IIRC) it often doesn't happen; and

2. (Again, in the opinion of the Fuchsia team) Android needed an ecosystem reset. I think this was a little more vague and, from what I could gather, meant different things to different people. But Android has a strange architecture. Certain parts are in the AOSP but an increasing amount was in what was then called Google Play Services. IIRC, an example was an SSL library. AOSP had one. Play had one.

Fuchsia, at least at the time, pretty much moved everything (including drivers) from kernel space into user space. More broadly. Fuchsia can be viewed in a similar way to, say, Plan9 and micro-kernel architectures as a whole. Some think this can work. Some people who are way more knowledgeable and experienced on OS design seem to be pretty vocal saying it can't because of the context-switching. You can find such treatises online.

In my opinion, Fuchsia always struck me as one of those greenfield vanity projects meant to keep very senior engineers. Put another way: it was a solution in search of a problem. You can argue the flaws in Android architecture are real but remember, Google doesn't control the hardware. At that time at least, it was Samsung. It probably still is. Samsung doesn't like being beholden to Google. They've tried (and failed) to create their own OS. Why would they abandon one ecosystem they don't control for another they don't control? If you can't answer that, then you shouldn't be investing billions (quite literally) into the project.

Stepping back a bit, Eric Schmidt when he was CEO seemed to hold the view that ChromeOS and Android could coexist. They could compete with one another. There was no need to "unify" them. So often, such efforts to unify different projects just lead to billions of dollars spent, years of stagnation and a product that is the lowest common denominator of the things it "unified". I personally thought it was smart not to bother but I also suspect at some point someone would because that's always what happens. Microsoft completely missed the mobile revolution by trying to unify everything under Windows OS. Apple were smart to leave iOS and MacOS separate.

The only fruit of this investment and a decade of effort by now is Nest devices. I believe they tried (and failed) to embed themselves with Chromecast

But I imagine a whole bunch of people got promoted and isn't that the real point?

raggi · 2025-03-04T07:16:38 1741072598

This is probably the most complete story told publicly, but there was a lot of timeline with a lot of people in it, so as with any such complicated history "it depends who you ask and how you frame the question": https://9to5google.com/2022/08/30/fuchsia-director-interview...

murderfs · 2025-03-04T08:09:52 1741075792

I remember reading the fuchsia slide deck and being absolutely flabbergasted at the levels of architecture astronautics going on in it. It kept flipping back and forth between some generic PM desire ("users should be able to see notifications on both their phone and their tablet!") to some ridiculous overcomplication ("all disk access should happen via a content-addressable filesystem that's transparently synchronized across every device the user owns").

The slide with all of the "1.0s" shipped by the Fuchsia team did not inspire confidence, as someone who was still regularly cleaning up the messes left by a few select members, a decade later.

cmrdporcupine · 2025-03-04T13:51:44 1741096304

+1

I worked on the Nest HomeHub devices and the push to completely rewrite an already shipped product from web/HTML/Chromecast to Flutter/Fuchsia was one of the most insane pointless wastes of money and goodwill I've seen in my career. The fuchsia teams were allowed to grow to seemingly infinite headcount and make delivery promises they could not possibly satisfy -- miss them and then continue with new promises to miss --while the existing software stack was left to basically rot, and disrespected. Eventually they just killed the whole product line so what was the point?

It was exactly the model of how not to do large scale software development.

Fuchsia the actual software looks very cool. Too bad it was Google doing it.

snarfy · 2025-03-04T15:29:26 1741102166

Linux's ever evolving ABI is a feature, not a bug. It's how Linux maintains technical excellence. I'll take that over a crusty backwards compatibility layer written 30 years ago that is full of warts.

thorncorona · 2025-03-04T17:23:47 1741109027

Driver instability is hardly a feature

cletus · 2025-03-01T14:53:22 1740840802

To summarize:

1. Over time, profits will tend to decrease. The only way to sustain or increase profits is to cut costs or increase prices;

2. Executive compensation is tied to short-term profit making and/or (worse) the share price;

3. The above leads to eery aspect of a company becoming financialized. We put the accountants in charge of everything. If you look at pretty much any company (eg Boeing) you can trace back their downfall to an era of making short-term profit decisions;

4. Intel has spent ~$152 billion in share buybacks over the last 35 years [1]. Why they need any subsidies is beyond me;

5. We keep giving money to these companies without getting anything in return. We fund research. Pharma companies get to profit off that without giving anything back. We bail out banks after 2008. Why didn't we just nationalize failing those failing banks, restructure then sell (like any other bankruptcy)? We hand out subsidies with no strings attached. A lot of political hay is made out of "welfare" abuse. Well, the biggest form of welfare abuse is corporate welfare;

6. It is important to maintain a high corporate tax rate. Why? Because a low corporate tax rate means there is little to no cost to returning money to shareholders instead of investing in the business. You make $1 billion in profits. What do you do? If you invest it in the business, you get to spend $1 billion. If you pay a dividend or do a buyback, you get to give back $790 million (@ 21% corporate tax rate). Now imagine that corporate tax rate was 40% instead. It completely changes the decision-making process.

7. The Intel of 20 years ago was a fabrication behemoth that led the industry. It's crazy how far it's fallen and how it's unable to produce anything. It's been completely eclipsed by TSMC. Looking back, the decade long delays in 10nm should've set off alarm bells at many points along the way.

8. There is no downside to malfeasance by corporate executives. None. In a just world, every one of the Sacklers would die penniless in a prison cell.

[1]: https://www.commondreams.org/opinion/intel-subsidy-chips-act...

Diggsey · 2025-03-01T15:19:28 1740842368

> Now imagine that corporate tax rate was 40% instead. It completely changes the decision-making process.

Seems more like a question of degree. Dividends are also taxed as income so ~36% is already paid in tax depoending on the income of the shareholder. Increasing the corporate tax rate to 40% brings the effective tax rate to ~52%.

In my experience there's a more fundamental problem with large companies. In a small company, the best way to succeed as an individual (whatever position you have) is for the company as a whole to succeed. At a very large company, the best way to succeed is to be promoted up the ladder, whatever the cost. This effect is the worst at the levels just below the top: you have everything to lose and nothing to gain by the company being successful. It's far more effective to sabotage your peers and elevate yourself rather than work hard and increase the value of the company by a couple of percentage points.

The thing is, the people that have been there since the beginning still have the mindset of helping the company as a whole succeed, but after enough time and enough people have been rotated out, you're left with people at the top who only care about the politics. To them the company is simply a fixture - it existed before them and will continue to exist regardless of what they do.

cletus · 2025-03-01T15:31:08 1740843068

You're alluding to the double taxation problem with dividends. This is a problem and has had a bunch of bad solutions (eg the passthrough tax break from 2017) when in fact the solution is incredibly simple.

In Australia, dividends come with what are called "franking credits". Imagine a company has a $1 billion profit and wants to pay that out as a dividend. The corporate tax rate is 30%. $700M is paid to shareholders. It comes wiht $300m (30%) in franking credits.

Let's say you own 1% of this company. When you do your taxes, you've made $10M in gross income (1% of $1B), been paid $7M and have $3M in tax credits. If your tax rate is 40% then you owe $4M on that $10M but you have already effectively paid $3M on that already.

The point is, the net tax rate on your $10M gross payout is still whatever your marginal tax rate is. There is no double taxaation.

That being said, dividends have largely fallen out of favor in favor of share buybacks. Some of those reasons are:

1. It's discretionary. Not every shareholders wants the income. Selling on the open market lets you choose if you want money or not;

2. Share buybacks are capital gains and generally enjoy lower tax rates than income;

3. Reducing the pool of available shares puts upward pressure on the share price; and

4. Double taxation of dividends.

There are some who demonize share buybacks specifically. I'm not one of them. It's simply a vehicle for returning money to shareholders, functionally very similar to dividends. My problem is doing either to the point of destroying the business.

jajko · 2025-03-01T15:11:50 1740841910

Good points but AFAIK bank loans from 2008 were paid back with interests, those were definitely not some free money. I would focus on root causes instead of populists shallow statements like that - too few regulations and oversight that allowed creation of securities that should never have existed in first place.

No industry will self-regulate, as you write the lure of short term bonuses for execs is too high and punishment for failures are non existent. I expect current US admin will make this even worse, greed and short term profit seems to be the only focus.

cletus · 2025-03-01T15:20:16 1740842416

I'm all for root cause analysis. A big part of that is that large companies become extremely risk-tolerant because history has shown there is little to no downside to their actions. If the government always bails you out, what incentive is there to be prudent? You may as well fly close to the Sun and pay out big bonuses now. Insolvency is a "next quarter" problem.

I'm aware that TARP funds were repaid. Still, a bunch of that money went straight into bonuses [1]. Honestly, I'd rather the company be seized, restructured and sold.

You know who ends up making sacrifices to keep a company afloat? The labor force. After 2008, auto workers took voluntary pay cuts, gave up benefits and otherwise did what they could to keep the company afloat, benefits it took them ~15 years to fight to get back. In a just world, executive compensation would go down to $1 until such a time that labor sacrifices are repaid.

[1]: https://www.theguardian.com/business/2009/jul/30/bank-bonuse...

marcosdumay · 2025-03-01T15:09:08 1740841748

On #6, that's an individual income tax (or capital gain tax, depends on how you define things). Corporate income tax is the one that is applied independently of the money being invested on the corporation or distributed.

I'm don't think you should subsidize reinvesting in huge companies anyway. What do you expect to gain from them becoming larger?

It's much better (for society) to let them send the money back to shareholders so they can invest on something else.

cletus · 2025-03-01T15:23:38 1740842618

Reinvesting in the company is the one thing we should absolutely subsidize. That goes to wages, capital expenditure and other measures to sustain and grow the company.

Paying out dividends and doing share buybacks just strips the company for cash until there's nothing of value left. It's why entshittification is a thing.

marcosdumay · 2025-03-01T17:01:56 1740848516

Treating all wages as expenses seems fine to me. But have you noticed that large companies just stop growing at some point and it doesn't matter how much money you pour at them?

That is, unless they use the extra capital to buy legally-enforced monopolies, or bribe regulators out of their way.

And no, enshitification is a thing because people want those companies to grow and grow, and keep growing. Some times even after they have the majority of humanity as customers.

cletus · 2025-01-23T00:47:02 1737593222

Speaking as a former Google Fiber software engineer, I'm honestly surprised this is still around.

In 2017, basically all the Google Fiber software teams went on hiatus (mine included). I can't speak to the timing or rationale but my theory is that the Google leadership couldn't decide if the future of Internet was wired or wireless and a huge investment in wired may be invalidated if the future Internet was wired so rather than guessing wrong, the leadership simply decided to definitely lose by mothballing the whole thing.

At that time, several proposed cities were put on hiatus, some of which had already hired local people. In 2019, Google Fiber exited Louisville, KY, paying penalties for doing so [1]. That really seemed like the end.

I also speculated that Google had tried or was trying to sell the whole thing. I do wonder if the resurrection it seems to have undergone is simply a result of the inability to find a buyer. I have no information to suggest that one way or the other.

There were missteps along the way. A big example was the TV software that was originally an acquisition, SageTV [2]. Somebody decided it would be a good idea to completely rewrite this Java app into Web technologies on an embedded Chrome instance on a memory-limited embedded CPU in a set-top box. Originally planned to take 6 months, it took (IIRC) 3.5+ years.

But that didn't actually matter at all in the grand scheme of things because the biggest problem and the biggest cost was physical network infrastructure. It is incredibly expensive and most of the issues are hyperlocal (eg soil conditions, city ordinances) as well as decades of lobbying by ISPs of state and local governments to create barriers against competition.

[1]: https://arstechnica.com/information-technology/2019/04/googl...

[2]: https://arstechnica.com/information-technology/2011/06/googl...

WorldMaker · 2025-01-23T18:02:44 1737655364

> In 2019, Google Fiber exited Louisville, KY, paying penalties for doing so

Those mistakes in Louisville were huge. Literally street destroying mistakes that city Civil Engineers predicted and fought from happening in the first place, but Google Fiber did them anyway. Left a huge bill to the city taxpayers. It wasn't bigger news and a bigger upset because of NDAs and other contract protection things involved, but as an outsider to those NDAs/contracts, I can say it was an incredibly bad job on too many fronts, and should have left Google Fiber with a much more tarnished reputation than it did.

emmanueloga_ · 2025-01-23T06:25:23 1737613523

> There were missteps along the way. A big example was the TV software that was originally an acquisition, SageTV [2]. Somebody decided it would be a good idea to completely rewrite this Java app into Web technologies on an embedded Chrome instance on a memory-limited embedded CPU in a set-top box. Originally planned to take 6 months, it took (IIRC) 3.5+ years.

I worked on the "misstep" with a small team, and it’s wild to see Fiber still around and even expanding to new cities. As far as I can tell, the set-top box software had nothing to do with why Fiber was scaled down. Also, usability surveys showed people really liked the GUI!

The client supported on-demand streaming, live TV, and DVR on hardware with... let’s call them challenging specs. Still, it turned out to be a pretty slick app. We worked hard to keep the UI snappy (min 30 FPS), often ditching DOM for canvas or WebGL to squeeze out the needed performance. A migration to Cobalt [1], a much lighter browser than embedded Chromium, was on the table, but the project ended before that could happen.

Personally, it was a great experience working with the Web Platform (always a solid bet) on less-traditional hardware.

--

1: https://developers.google.com/youtube/cobalt

shadowfu · 2025-01-24T15:19:04 1737731944

+1 to what was said above; the UI didn't take 3.5 years to make - we launched it fairly quickly and then continued to improve on it. Later there was large UX refresh, so maybe that's where OP is getting confused? Either way, that software continued to work for years after the team was moved on to other projects. SageTV was good, but the UI wasn't java - it was a custom xml-like layout.

throwaway314155 · 2025-01-23T01:33:33 1737596013

> In 2017, basically all the Google Fiber software teams went on hiatus (mine included).

What does a hiatus entail in this case? Did these teams all just stop working on Fiber stuff and sit around all day hoping they would be given something to do?

dtaht · 2025-01-23T03:32:11 1737603131

They laid us all off. They had huge plans - millions of users! Then they intersected reality in KC where all people wanted was 5Mbit service and free TV... There were many, many people working to perfect the settop box for example. We got fq_codel running on the wifi, we never got anywhere on the shaper, the plan was to move 1+m units of that (horrible integrated chip the comcerto C2000 - it didn´t have coherent cache in some cases), I think they barely cracked 100k before pulling the plug on it all....

and still that box was better than what most fiber folk have delivered to date.

At least some good science was done about how ISPs really work... and published.

https://netdevconf.org/1.1/talk-measuring-wifi-performance-a...

osmsucks · 2025-01-23T10:23:19 1737627799

> They laid us all off.

I think you mean "they advanced their amazing bet".

https://fiber.google.com/blog/2016/10/advancing-our-amazing-...

throwaway314155 · 2025-01-23T04:10:48 1737605448

That's fascinating actually. You should consider doing a full blog writeup if that's something you're into.

dtaht · 2025-01-23T17:17:56 1737652676

Too bitter. I referenced a little of that "adventure" here, in 2021... gfiber was attempting to restart with refreshing their now obsolete hardware... https://blog.cerowrt.org/post/trouble_in_paradise/

dhosek · 2025-01-23T04:47:19 1737607639

I was thinking the same thing, not to mention that when Google Fiber was first announced, I was happy to be all in on Google for services but now, I’d be hesitant to use them for anything more than I’m already tied to.