I’ve heard the argument made that LLMs are simply models overfit on the entirety...

bheadmaster · on April 26, 2023

I find it hard to believe that every single problem has a solution on the internet, which is what "overfitting" would imply.

I was playing Fallout: New Vegas on Wine, and for some reason, the music on my pipboy wasn't playing. I searched the internet for the Wine errors from the terminal to no avail, and as a last resort asked ChatGPT. It gave me step-by-step instructions on how to fix it, and it worked.

If that doesn't demonstrate that LLMs have some kind of internal model of the world and understanding of it, then I don't know what will.

alphydan · on April 26, 2023

An alternative explanation is that your google-fu is not as good as openai's crawlers or the WebText corpus (which can go a lot deeper than any of your searches).

Buttons840 · on April 26, 2023

Makes me wonder how Google would compare if a good portion of the entire world hadn't spent decades trying to game their system. OpenAI created a completely new system and reaps the benefits of training on data that hasn't been twisted-half-way-to-hell to exploit it.

ChatGTP · on April 26, 2023

I can’t find anything on Google anymore so I’m not sure this proves much except open ai has better search.

nyrikki · on April 26, 2023

ML is fundamentally pattern finding and matching.

The fact that it is useful for finding patterns that may be different than humans tend to find is not an indication of understanding of the underlying data.

It is no different than clustering in traditional stats. While those found patterns are sometimes incredibly useful, clustering knows nothing outside of the provided dataset.

As other have mentioned, Google's search results are actually really bad at finding novel results these days due to many factors like battling SEO tricks etc...

But while the results of LLMs is impressive, there is no mechanisms for it to have an 'internal model of the world' in their current form.

It may help to remember that current LLMs would require an infinity of RAM to be even computationally complete right now.

bheadmaster · on April 26, 2023

> The fact that it is useful for finding patterns that may be different than humans tend to find is not an indication of understanding of the underlying data.

Without invoking your own self-awareness as an argument, how do you know that other people "understand" stuff, and aren't merely "finding patterns"? In other words, in what way do you define "understanding", such that you can be sure that LLMs have no such thing?

> there is no mechanisms for it to have an 'internal model of the world' in their current form.

How do you know that? We don't even know why humans have an internal model of the world. What if internal modelling of the world is just sufficiently-complex pattern-matching?

SanderNL · on April 26, 2023

If clustering has worked on what amounts to basically the entire world of information things get a bit fuzzy though. I don't suppose you are technically incorrect, it's just that these words lose practical meaning when we talk about models that encode tens or even hundreds of billions of parameters.

Predicting the "next token" requires an "internal model of the world". It might not be how we do it, but without something that acts like it I'd be very interested in how you think it comes up with its predictions.

Let's say it needs to continue a short story about a detective. The detective says at the end: "[...] I have seen every clue and thought of every scenario. I will tell you who the killer is:". Good luck continuing that with any sort of accuracy if you don't have some abstract map of how "people" act. You can see how I can think of a lot of examples that require something that acts as a model of the "world".

There's a definite structure and pattern to everything we do. This (to an LLM) hidden context gives rise to the words we write. To re-invent them, like it has to do, it must basically conjure up all this hidden state. I'm not saying it gets it right, I'm just saying that there is no other way than to model the world behind the text to even get into ballpark-right territory.

sebzim4500 · on April 26, 2023

>It may help to remember that current LLMs would require an infinity of RAM to be even computationally complete right now.

Anything that is computationally compute needs an infinite amount of RAM. This is not unique to LLMs or even to machine learning.

fauxpause_ · on April 26, 2023

https://www.gog.com/forum/fallout_series/new_vegas_music_is_...

I haven’t played this game so I don’t know what I’m searching for but this was my first result. Seems on the money, no?

bheadmaster · on April 26, 2023

That's the thing - it isn't.

I've been through this forum, many reddit posts and other sites - none of the solutions worked. What worked was that ChatGPT figured out that I need to add the following line to ~/.wine/system.reg:

    [Software\\Wine\\GStreamer]
    "DllOverrides"="mscoree,mshtml="

And install 32-bit version of gstreamer good plugins:

    sudo apt-get install gstreamer1.0-plugins-good:i386

If you happen to find these exact instructions anywhere on the internet, please share, as that will be enough to convince me that LLMs aren't anything more than glorified search engines.

Otherwise, I can't help but be skeptical. If nothing else, it's plausible that LLMs have some kind of internal representation of the world.

fauxpause_ · on April 26, 2023

https://baronhk.wordpress.com/2021/10/05/wine-still-needs-32...

How about this guy? You sure the first instruction is necessary?

I do think there is some decent ability to piece things together. But this example seems too niche.

version_five · on April 26, 2023

I'd be interested to know if anyone has studied how overfitting translates into the domain of llm output: it's easy to understand when you're fitting a line though data, or building a classifier, you overfit and your test set loss is higher than your training set loss, and this directly relates to worse performance of the model. For an llm picking probably next words, what's the analogy, and does overfitting make it "worse" even if a test set loss is higher?

hackernewds · on April 26, 2023

How is double descent a good counterpoint to the claim?

qumpis · on April 26, 2023

That novel reasoning is due to the model transitioning to a generalization regime, not merely parroting training data?

malux85 · on April 26, 2023

Overfitting is lack of genralisation, LLMs can produce novel output by using abstraction composition, which you cannot do without generalisation.

Other signs of non-overfitting: Abstraction laddering, task decomposition, novel (i.e. unseen) joke explanation