I’ve heard the argument made that LLMs are simply models overfit on the entirety of the internet. Double decent is a good counter point to that claim, it suggests something more interesting than simple overfitting is happening with large parameter counts.
I find it hard to believe that every single problem has a solution on the internet, which is what "overfitting" would imply.
I was playing Fallout: New Vegas on Wine, and for some reason, the music on my pipboy wasn't playing. I searched the internet for the Wine errors from the terminal to no avail, and as a last resort asked ChatGPT. It gave me step-by-step instructions on how to fix it, and it worked.
If that doesn't demonstrate that LLMs have some kind of internal model of the world and understanding of it, then I don't know what will.
An alternative explanation is that your google-fu is not as good as openai's crawlers or the WebText corpus (which can go a lot deeper than any of your searches).
Makes me wonder how Google would compare if a good portion of the entire world hadn't spent decades trying to game their system. OpenAI created a completely new system and reaps the benefits of training on data that hasn't been twisted-half-way-to-hell to exploit it.
The fact that it is useful for finding patterns that may be different than humans tend to find is not an indication of understanding of the underlying data.
It is no different than clustering in traditional stats. While those found patterns are sometimes incredibly useful, clustering knows nothing outside of the provided dataset.
As other have mentioned, Google's search results are actually really bad at finding novel results these days due to many factors like battling SEO tricks etc...
But while the results of LLMs is impressive, there is no mechanisms for it to have an 'internal model of the world' in their current form.
It may help to remember that current LLMs would require an infinity of RAM to be even computationally complete right now.
> The fact that it is useful for finding patterns that may be different than humans tend to find is not an indication of understanding of the underlying data.
Without invoking your own self-awareness as an argument, how do you know that other people "understand" stuff, and aren't merely "finding patterns"? In other words, in what way do you define "understanding", such that you can be sure that LLMs have no such thing?
> there is no mechanisms for it to have an 'internal model of the world' in their current form.
How do you know that? We don't even know why humans have an internal model of the world. What if internal modelling of the world is just sufficiently-complex pattern-matching?
If clustering has worked on what amounts to basically the entire world of information things get a bit fuzzy though. I don't suppose you are technically incorrect, it's just that these words lose practical meaning when we talk about models that encode tens or even hundreds of billions of parameters.
Predicting the "next token" requires an "internal model of the world". It might not be how we do it, but without something that acts like it I'd be very interested in how you think it comes up with its predictions.
Let's say it needs to continue a short story about a detective. The detective says at the end: "[...] I have seen every clue and thought of every scenario. I will tell you who the killer is:". Good luck continuing that with any sort of accuracy if you don't have some abstract map of how "people" act. You can see how I can think of a lot of examples that require something that acts as a model of the "world".
There's a definite structure and pattern to everything we do. This (to an LLM) hidden context gives rise to the words we write. To re-invent them, like it has to do, it must basically conjure up all this hidden state. I'm not saying it gets it right, I'm just saying that there is no other way than to model the world behind the text to even get into ballpark-right territory.
I've been through this forum, many reddit posts and other sites - none of the solutions worked. What worked was that ChatGPT figured out that I need to add the following line to ~/.wine/system.reg:
If you happen to find these exact instructions anywhere on the internet, please share, as that will be enough to convince me that LLMs aren't anything more than glorified search engines.
Otherwise, I can't help but be skeptical. If nothing else, it's plausible that LLMs have some kind of internal representation of the world.
I'd be interested to know if anyone has studied how overfitting translates into the domain of llm output: it's easy to understand when you're fitting a line though data, or building a classifier, you overfit and your test set loss is higher than your training set loss, and this directly relates to worse performance of the model. For an llm picking probably next words, what's the analogy, and does overfitting make it "worse" even if a test set loss is higher?