Exactly my experience too. Whoever says they're able to solve "very complex" problems with LLMs, is clearly not working on objectively complex problems.
I'm not usually micro-managing it, that's the point.
I sometimes do on problems where I have particular insight, but I mostly find it is far more effective to give it test cases and give it instructions on how to approach a task, and then let it iterate with little to no oversight.
I'm letting Claude Code run for longer and longer with --dangerously-skip-permissions, to the point I'm pondering rigging up something to just keep feeding it "continue" and run it in parallel on multiple problems.
Because at least when you have a good way of measuring success, it works.
If this is the maximum AGI-PhD-LRM can do, that'll be disappointing compared to investments. Curious to see what all this will become in few years.