Karpathy: "context engineering" over "prompt engineering"

nico · 2025-06-25T18:34:28 1750876468

This is key

Besides model capabilities, one of the most important aspects of AI-assisted development right now, is context management

Cursor et al try to automate that for the user, and it works up to a point. But at a certain level of complexity, the user needs to get actively involved in managing the context

It also seems like some people who say they are having very good results with agentic coding, take a lot of care in managing their cursor rules or claude.md files

marshyj · 2025-06-25T20:55:55 1750884955

I do like context engineering better, I also agree that there's a lot that goes into getting good answers out of LLMs and GPT wrapper is a gross oversimplification for many of the products being built on top of them. Just putting good evals in place is often a complicated task.

jangletown · 2025-06-26T14:05:12 1750946712

That's true, we have been trying to help customers doing evals for ages now, and it's super hard for everyone to build a really good dataset and define great quality metrics

just wanted then to shameless plug this lib I've built recently for this very topic, because it's been much easier to sell that into our clients than evals really, because it's closer to e2e tests: https://github.com/langwatch/scenario

instead of 100 examples, it's easier for people to think on just the anecdotal example where the problem happens and let AI expand it, or replicate a situation from prod and describe the criteria in simple terms or code

jangletown · 2025-06-26T13:56:23 1750946183

I love the term! But I do think it's both really, after all this time, LLMs are still very finicky, even the order of the instructions still matter a lot, even with the right context, so you are still prompt engineering, ideally this will go away and only context engineering will remain