Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We are pretty much plateauing in base model performance since gpt4. It's mostly tooling and integration now. The target is also AGI so no matter your product you will get measured on your progress towards it. With new "sota" models popping up left and right you also have no good way of user retention because the user is mostly interested in the models performance not the funny meme generator you added. looking at you openai...

"They called me bubble boy..." - some dude at Deutsche.



So, how do you feel about the recent IMO stuff? Don't they cause a consistency problem for your view that we've plateaued-- to me at least, I felt we were something like two years away from this kind of thing.

Probably very expensive to run of course, probably ridiculously so, but they were able to solve really difficult maths problems.


> So, how do you feel about the recent IMO stuff?

It's not real, they are cheating on benchmarks. (Just like the previous many times this was announced.)


The biological brain of the top human IMO guy runs on 20 watts. I wonder how much electricity Google used to match that performance.


What is the training cost of such human? Reliability is another concern. There is no manufacturer whom you can pay 10 billion and get few 1000 of trained processor.


And they still couldn’t solve P6. All that power to perform worse than many human contestants.


>We are pretty much plateauing in base model performance since gpt4.

Reasoning models didn't even exist at the time, LLMs were struggling a lot with math at the time, now it's completely different with SOTA models, there have been massive improvements since gpt4.


The transformer paper was published in 2017, and within 8 years (less so, if i'm being honest), we have bots that passed the Turing test. To people with shorter term memories, passing the turing test was a big deal.

My point is that even if things are pleatuing, a lot of these advancements are done in step change fashion. All it takes is one or two good insights to make massive leaps, and just because things are plateauing now, it's a bad predictor for how things will be in the future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: