Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Were the tests generated by an AI then? How do you know whether they are really comprehensive?




Yes, effectively my entire project was generated by AI.

It's a very weird and uncomfortable way of working - I've said in the past that I don't like a single line of unreviewed AI-generated code in anything beyond a prototype, and now here I am with 13,000+ lines of mostly unreviewed Python written by Claude Opus 4.5.

I'm leaving the alpha label on it until I'm a whole lot more comfortable with the codebase!

I do however know that the tests are pretty comprehensive because I had the model use TDD from the very start - write a test, watch it fail, then implement code to make it pass.

I was able to keep an eye on what it was doing on my phone while it worked and the TDD process seemed to be staying honest.

Here's one example from the full transcript, showing how it implemented closures: https://static.simonwillison.net/static/2025/claude-code-mic...


I'm now having Claude Code build the tests for my voicenotes organization application. For the most basic implementation - just a single text field - I wrote in English which tests I know I need, there were about two dozen. Approaching size limits, unicode, normalization, nonprinting characters, Hebrew vowel points, empty strings vs NULL strings, Exceeded byte length without exceeded character length, etc etc. I then threw Claude Code at it.

Claude Code found more edge cases to write tests for than I ever would have thought of. And I've been doing this for 20 years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: