This is one of known hardest parts of RL. The short answer is human feedback. Bu...

ewoodrich · 2025-10-16T23:43:32 1760658212

Whenever I watch Claude Code or Codex get stuck trying to force a square peg into a round hole and failing over and over it makes me wish that they could feel the creeping sense of uncertainty and dread a human would in that situation after failure after failure.

Which eventually forces you to take a step back and start questioning basic assumptions until (hopefully) you get a spark of realization of the flaws in your original plan, and then recalibrate based on that new understanding and tackle it totally differently.

But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

Because no big deal, if it’s wrong it’s the human's problem to untangle and Anthropic gets paid either way so why not try?

jon-wood · 2025-10-17T14:58:28 1760713108

> But instead I watch Claude struggling to find a directory it expects to see and running random npm commands until it comes to the conclusion that, somehow, node_modules was corrupted mysteriously and therefore it needs to wipe everything node related and manually rebuild the project config by vague memory.

In fairness I have on many an occasion worked with real life software developers who really should know better deciding the problem lies anywhere but their initial model of how this should work. Quite often that developer has been me, although I like to hope I've learned to be more skeptical when that thought crosses my mind now.

ewoodrich · 2025-10-17T17:20:39 1760721639

Right, but typically making those kind of mistakes creates more work for yourself and with the benefit of experience you get better at recognizing the red flags to avoid getting in that situation again. but it

Which is why I think the parent post had a great observation about human problem solving having evolved in a universe inherently formed by the additive effect of every previous decision you've ever made made in your life.

There's a lot of variance in humans, sure, but inescapable stakes/skin in the game from an instinctual understanding that you can't just revert to a previous checkpoint any time you screw up. That world model of decisions and consequences helps ground abstract problem solving ability with a healthy amount of risk aversion and caution that LLMs lack.

mbesto · 2025-10-16T20:31:30 1760646690

This 100%.

While we might agreed that language is foundational to what it is to be human, it's myopic to think its the only thing. LLMs are based on training sets of language (period).