Hacker Newsnew | past | comments | ask | show | jobs | submit | MoreQARespect's commentslogin

Testing is probably my favorite topic in development and I kind of wish I could make it my "official" specialty but no way in hell am I taking a pay cut and joining the part of the org nobody listens to.

This paper has 7 references and 4 of them are to a single google blog post that treats test flakiness as an unavoidable fact of life rather than a class of bug which can and should be fixed.

Aside from the red flag of one blog post being >50% of all citations it is also the saddest blog post google ever put their name to.

There is very little of interest in this paper.


Coz while devs with specialties usually get paid more than a generalist, for some reason testing as a specialty means getting a pay cut and a loss in respect and stature.

Hence my username.

I wouldnt ever sell myself as a test automation engineer but whenever i join a project the number one most broken technical issue in need of fixing is nearly always test automation.

I typically brand this work as architecture (and to be fair there is overlap) and try to build infra and tooling less skilled devs can use to write spec-matching tests.

Sadly if i called it test automation i'd have to take a pay cut and get paid less than those less skilled devs who need to be trained to do TDD.


I think there are 3 'kinds' of QA who are not really interchangeable as their skillsets don't really overlap.

- Manual testers who don't know how to code at all, or at least arent' good enough to task them with writing code

- People who write automated tests (who might or might not also do manual testing)

- People writing test automation tools, managing and desigining Test infra etc. - these people are regular engineers and engineering skillsets. I don't think there's generally a difference in treatment or compensation, but I also don't really consider this 'QA work'

As for QA getting paid less - I don't agree with this notion, but I see why it happens. Imo and ideal QA would be someone, who's just as skilled in most stuff as a dev (except does something a bit different), has the same level of responsibility and capacity for autonomy - in exchange I'd argue they deserve the same recognition and compensation. And not giving them that leads to the best and brightest leaving for other roles.

I think it's amazing when one gets to work with great QA, and can rest easy that anything they make will get tested properly, and you get high quality bug reports, and bugs don't come back from the field.

Also it bears mentioning, that it's self-evident to me, but might not be self-evident to everyone, that devs should be expected to do a baseline level of QA work themselves - they should verify the feature is generally working well and write a couple tests to make sure this is indeed the case (which means they have to be generally aware how to write decent tests).


>Reminds me of TDD bandwagon which was all the rage when I started programming. It took years to slowly die out and people realized how overhyped it really was.

It never really went away. The problem is that there is a dearth of teaching materials telling people how to do it properly:

* E2E test first

* Write high level integration tests which match requirements by default

* Only start writing lower level unit tests when a clear and stable API emerges.

and most people when they tried it didn't do that. They mostly did the exact opposite:

* Write low level unit tests which match the code by default.

* Never write a higher level tests (some people don't even think it's possible to write an integration or e2e test with TDD because "it has to be a unit test").


Not even sure the problem is just education.

For something complex, it’s kinda hard to write and debug high level tests when all the lower level functionality is missing and just stubbed out.

We don’t expect people to write working software that cannot be executed first, yet we expect people to write (and complete) all tests before the actual implementation.

Sure for trivial things, it’s definitely doable. But then extensive tests wouldn’t be needed for such either!

Imagine someone developing an application where the standard C library was replaced with a stub implementation… That wouldn’t work… Yet TDD says one should be able to do pretty much the same thing…


>Imagine someone developing an application where the standard C library was replaced with a stub implementation… That wouldn’t work… Yet TDD says one should be able to do pretty much the same thing…

No it doesnt say you should do that. TDD says red green refactor that is all. You can and should do that with an e2e test or integration test and a real libc to do otherwise would be ass backwards.

Yours is the exact unit testing dogma that I was referring to that people have misunderstood as being part of TDD due to bad education.


Would you be able to share any links that expand upon your recommended approach? It makes complete sense to me as a self-taught dev, and is what I've always done (most recently, an e2e test of a realtime cdc etl pipeline, checking for/logging and fixing various things along the way until I was getting the right final output). I rarely write unit tests. It would be good to read something more formal in support of what I've naturally gravitated towards


no, but i have a feeling i should write one because i keep running into this misunderstanding.

it makes it really hard to recommend TDD when people believe they already know what it is but are doing it ass backwards.


I just remembered this essay from the creator of HTMX. My approach is very similar.

https://htmx.org/essays/codin-dirty/


TDD failed because it was sold as a method on how to write better tests yet in reality it was a very challenging skill to learn on how to write software that involved a fundamental change in how you approached requirements engineering, software development, iterations and testing. Even with a skilled team the cost to adapt TDD would be very high for an uncertain outcome. So people tried shortcuts like you described and you can't blame them. The whole movement was flawed and unrealistic in its expectations and communications.


1. Write test that generates an artefact (e.g. picture) where you can check look and feel (red).

2. Write code that makes it look right, running the test and checking that picture periodically. When it looks right, lock in the artefact which should now be checked against the actual picture (green, if it matches).

3. Refactor.

The only criticism ive heard of this is that it doesnt fit some people's conceptions of what they think TDD "ought to be" (i.e. some bullshit with a low level unit test).


You can even do this with LLM as a judge as well. Feed screenshots into a LLM as a judge panel and get them to rank the design 1-10. Give the LLM judge panel a few different perspectives/models to get a good distribution of ranks, and establish a rank floor for test passing.


Parent mentioned "subjective look and feel", LLMs are absolutely trash at that and have no subjective taste, you'll get the blandest designs out of LLMs, which makes sense considering how they were created and trained.


LLMs can get you to about a 7.5-8/10 just by iterating itself. The main thing you have to do is just wireframe the layout and give it the agent a design that you think is good to target.


Again, they have literally zero artistic vision and no, you cannot get an LLM to create a 7.5 out of 10 web design or anything else artistic, unless you too miss the facilities to properly judge what actually works and looks good.


You can get an AI to produce a 10/10 design trivially by taking an existing 10/10 design and introducing variation along axes that are orthogonal to user experience.

You are right that most people wouldn't know what 10/10 design looks/behaves like. That's the real bottleneck: people can't prompt for what they don't understand.


Yeah, obviously if you're talking about copying/cloning, but that's not what I thought the context here was, I thought we were talking about LLMs themselves being able to create something that would look and feel good for a human, without just "Copy this design from here".


That only works for the simplest minimally interactive examples.

It is also so monumentally brittle that if you do this for interactive software, you will drive yours nuts trying.


The reason why property testing isnt used that much is because it is useful at catching tests only for a specific type of code which most people arent writing.


I'm not sure that's true. In essence, property tests are a method for defining types where a language lacks natural expression. In a vacuum, nearly all code could benefit from (more advanced) types. But:

1. Tradeoffs, as always. The more advanced typing you head towards, the much more time consuming it becomes to reason about the program. There is good reason for why even the most staunch type advocates rarely push for anything more advanced than monads. A handful of assertive tests is usually good enough, while requiring significantly less effort.

2. Not just time consuming, but often beyond comprehension. Most developers just don't know how to think in terms of formal proofs. Throw a language with an advanced type system, like Coq or Idris, in from of them and they wouldn't have a clue what to do with it (even ignoring the unfamiliar syntax). And with property tests, now you're asking them to not only think in advanced types, but to also effectively define the types themselves from scratch. Despite #1, I fully expect we would still see more property testing if it weren't for this huge impediment.


>Most developers just don't know how to think in terms of formal proofs

Formal proofs are useful on the same class of bug property tests are.

And vice versa.

The issue isnt necessarily that devs cant use them, it's that the problems they have which cause most bugs do not map on to the space of "what formal proofs are good at".


What do you consider to be the source of most bugs?


I have. I call it snapshot test driven development. You put the preconditions in, generate and record the graphics as an artefact at runtime and when it looks right, freeze it.


But that isn't TDD, no line of code should be written without broken tests.


Yes it is. Until the artefact which has been visually validated is locked in it is still a broken test.

You can argue semantics until you're blue in the face it still follows red-green-refactor and it confers the same benefits as TDD.


Your nickname tells me you are not talking bs.


I sincerely hope aliens dont end up being space americans. earth has oil and wmds.


I find it's better to write the kind of tests you can generate docs from.

It doesnt go out of date and you dont have to keep checking it for hallucinations.

Better to save AI for tasks where it's less damaging if it screws up.


From reading that my guess would be that the IP of your host gotten from your hosting provider had some spammy history before you started hosting your blog on it.

Either that or your DNS provider hosts a lot of spam.


Hmmm, I use https://njal.la/ for DNS. Could spamhaus really just auto-mark every njalla user as suspicious?


Yeah, possibly. Privacy related services are often used by spammers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: