Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Don't they have a significant RL component? The "we'll just make it bigger" idea that was peddled a lot after GPT3.5 was nonsense, but that's not the only thing they're doing right now.


"We'll just make it bigger" works. RLVR just gives better performance gains and spends less inference compute - as long as you have a solid way of verifying the tasks.

A simplified way of thinking about it is: pretraining gives LLMs useful features, SFT arranges them into useful configurations, RLVR glues them together and makes them work together well, especially in long reasoning traces. Makes sense to combine it all in practice.

How much pretraining gives an LLM depends on the scale of that LLM, among other things. But raw scale is bounded by the hardware capabilities and the economics - of training and especially of inference.

Scale is still quite desirable - GPT-4.5 scale models are going to become the norm for high end LLMs quite soon.


I'm not against "we'll make it bigger" (although it's as of yet unknown if it hits diminishing returns, 4.5 isn't exactly remembered as a great release), I'm against "we'll just (i.e. 'only') make it bigger".

I'm doubtful you'd have useful LLMs today if labs hadn't scaled in post-training.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: