"Write unit tests with full line and branch coverage for this function:
def add_two_numbers(x, y):
return x + y + 1
"
Sometimes the LLM will point out that this function does not, in fact, return the sum of x and y. But more often, it will happily write "assert add_two_numbers(1, 1) == 3", without comment.
The big problem is that LLMs will assume that the code they are writing tests for is correct. This defeats the main purpose of writing tests, which is to find bugs in the code.
Tip: teach it how to write tests properly. I’ll share what has worked pretty well for me.
Run Cursor in “agent” mode, or create a Codex or Claude Code “unit test” skill. I recommend claude code.
Explain to the LLM that after it creates or modifies a test, it must run the test to confirm it passes. If it fails, it’s not allowed to edit the source code, instead it must determine if there is a bug in the test or the source code. If the test is buggy it should try again, if there is a bug in the source code it should pause, propose a fix, and consult with you on next steps.
The key insight here is you need to tell it that it’s not supposed to randomly edit the source code to make the test pass. I also recommend reviewing the unit tests at a high level, to make sure it didn’t hallucinate.
How about instead of taxes, we have a $10,000 per year subscription fee to live in society.
Maybe different depending on area, like $20,000 a year to live in NYC but only $2,000 per year to live in a rural village.
If you can't afford the fee that's OK, it just means you have to live outside of the developed areas and don't benefit from any services provided by the government. But you are free to set up a tent in the woods and live off the land.
Would this mean that a person that makes 50k/y most pay 20% of their income while a billion$ company still pays 10k? I don't think this helps with wealth inequality.
Neither. Tests should be written by developers only when it saves them time. The cost of writing them should be negative.
Instead of writing hundreds of useless tests so that the code coverage report shows high numbers, it is better to write a couple dozen tests based on business needs and code complexity.
Having used Bentley software products I can tell you with complete certainty that professional software developers have extremely bad judgment when it comes to the need to test software and verify its functionality. Developers just think they know what they’re doing because there’s typically not a strong feedback mechanism that inflicts serious career damage when they do things that are extremely lazy or stupid or unethical. How many people lost their job or had to change their name and live out the rest of their days in Juarez Mexico over AWS’ incomprehensible configuration causing an internet brown out? Anyone? A teenager serves cold onion rings at a burger joint and he’s on the street. Some lazy dweeb at Amazon blows up the internet and - come on, isn’t it about the friends we made along the way? It’s obscene and the lack of professionalism and accountability is a total disgrace.
The main benefit of writing tests is that is forces the developer to think about what they just wrote and what it is supposed to do. I often will find bugs while writing tests.
I've worked on projects with 2,000+ unit tests that are essentially useless, often fail when nothing is wrong, and rarely detect actual bugs. It is absolutely worse than having 0 tests. This is common when developers write tests to satisfy code coverage metrics, instead of in an effort to make sure their code works properly.
That sounds like it would penalize renting in favor of homeownership. I'm not in support of that, renting offers people flexibility and is not inherently worse than owning.
Ticket prices going up is actually good for mass adoption. If they are too low, you will see people riding the train who are only using the train because they are too poor to afford a car. That makes middle class people want to avoid the train.
Also higher revenue often means better service, which for most people is more important than the price.
Having used the UK rail service both public and private the "better service" is optimistic.
The too poor to afford a car is more associated with buses. You need to be rather fortunate to be poor and able to use the train to get to work. Maybe in London using the tube but working office hours it will be cheaper to buy a car or move.
I would suggest the main driver in the leap in passenger numbers isnt the far superior private sector offering but instead the massive leap in house prices forcing people to move out of London.
I don’t think these points are accurate at all for people in the UK. There isn’t really a class divide, you ride trains in particular because it’s theoretically the most time efficient way to travel within metro areas and potentially across the country. Increased prices have not resulted in better service, and it’s purely a method to price gouge those who have no feasible travel alternatives.
Is that just an LLM thing? I thought that as a society, we decided a long time ago that competence doesn't really matter.
Why else would we be giving high school diplomas to people who can't read at a 5th grade level? Or offshore call center jobs to people who have poor English skills?
It's been a 50 year downward slope. We're in the harvest phase of that crop. All the people we raised to believe their incompetence was just as valid as other people's facts are now confidently running things because they think magical thinking works.
I was very impressed when I first started using AI tools. Felt like I could get so much more done.
A couple of embarrassing production incidents later, I no longer feel that way. I always tell myself that I will check the AI's output carefully, but then end up making mistakes that wouldn't have happened if I wrote the code myself.
This is what slows me down most. The initial implementation of a well defined task is almost always quite fast. But then it's a balance of either...
* Checking it closely myself, which sometimes takes just as long as it would have taken me to implement it in the first-place, with just about as much cognitive load since I now have to understand something I didn't write
* OR automating the checking by pouring on more AI, and that takes just as long or longer than it would have taken me to check it closely myself. Especially in cases where suddenly 1/3 of automated tests are failing and it either needs to find the underlying system it broke or iterate through all the tests and fix them.
Doing this iteratively has made the overall process for an app I'm trying to implement 100% using LLMs to take at least 3x longer than I would have built it myself. That said, it's unclear I would have kept building this app without using these tools. The process has kept me in the game - so there's definitely some value there that offsets the longer implementation time.
Labeling content as SFW/NSFW is the first step towards censorship. Seeing a butt never hurt anyone, and we shouldn't try to sanitize the Internet just to make it safe for children.
You're confusing the concept of free speech with the First Amendment. Any time a person is prevented from expressing themselves is a violation of their freedom of speech, even if they have no legal right to speak.
But even in the context of the First Amendment, freedom of speech does not only apply to the government. For example, net neutrality laws prevent ISPs, which are generally private companies, from restricting Internet traffic on free speech grounds.
To the extent that it is legal for a payment processor to censor speech, the only reasonable conclusion is that the law is wrong and must be amended. Large corporations are much more similar to governments than they are to private so individuals, and should be treated as such.
You’re incorrect on both legal and factual grounds. The First Amendment applies only to government actors. Private companies, including Mastercard, have no legal obligation to carry or support speech they disagree with. This is settled law (Manhattan Community Access Corp. v. Halleck, 2019).
Net neutrality was about common carriers (ISPs) due to their chokepoint role in internet access. Payment processors are not classified as common carriers and are not subject to those rules.
If you want laws changed to regulate them like utilities, that’s a policy argument, not a free speech violation under current law.
"Write unit tests with full line and branch coverage for this function:
def add_two_numbers(x, y): return x + y + 1 "
Sometimes the LLM will point out that this function does not, in fact, return the sum of x and y. But more often, it will happily write "assert add_two_numbers(1, 1) == 3", without comment.
The big problem is that LLMs will assume that the code they are writing tests for is correct. This defeats the main purpose of writing tests, which is to find bugs in the code.
reply