The LLM Problem

Jensson · on March 18, 2023

> The claim that LLMs are nothing more than fancy expensive markov chains is a dangerous oversimplification or, in other words, wrong.

But they are fancy expensive Markov chains. The learning here is that fancy expensive Markov chains can do a lot, not that the statement is wrong.

sebzim4500 · on March 18, 2023

They are fancy markov chains in the same way that your computer is a finite state machine. It is technically true, but it is an extremely unhelpful way of thinking about them.

taeric · on March 18, 2023

Unhelpful is very different from wrong.

Just as simplified does not mean reduced. It can align and help understanding, even if incomplete.

culi · on March 18, 2023

Correct is very different from helpful

Just as munged does not mean increased. It can contextualize understanding, even if incomplete

inimino · on March 18, 2023

Tell that to Alan Turing.

sebzim4500 · on March 18, 2023

Alan Turing wouldn't think of a modern computer as a finite state machine either, he would think of it as a very good approximation to a turing machine.

inimino · on March 19, 2023

Which is a finite state machine attached to an infinite tape.

taf2 · on March 18, 2023

Until it is helpful when you need to control the computer to process a sequence of data in a finite way similar to say a regular expression… #thisisfun

bluejay2387 · on March 18, 2023

"All models are wrong, some are useful". You could also look at them as a highly compressed PGM representing every possible conversation you could have in every language the model knows that has been trimmed to only the highly probable paths through training. In that view word prediction is really navigating nodes and this would explain their ability to enforce global constraints as well as they do :) There isn't going to be one "correct" way to view them and different perspectives might offer different insights.

rcarmo · on March 18, 2023

Not quite, I didn’t go into the details, but the analogy is flawed in a few ways: https://taoofmac.com/space/blog/2023/03/18/0140

famouswaffles · on March 18, 2023

They really aren't lol

Jensson · on March 18, 2023

Look up the definition for Markov chain.

akprasad · on March 18, 2023

A Markov chain satisfies the Markov property [1] in which the probability of an event is conditioned only on the previous event. Regardless of whether an LLM event is a character or a word, an LLM certainly does not follow a Markov process.

[1]: https://en.wikipedia.org/wiki/Markov_property

kelipso · on March 18, 2023

If you consider the entire input for the language model as the state, and consider the output to be the input concatenated with the next token, then it's a markov chain.

But that's only if you don't use BEAM search, which LLMs do use. If you use BEAM search with beam=4 (typical for LLMs), for example, then it does a tree search where it keeps track of the top 4 highest probability outputs, then I guess it's not a markov chain process anymore but it just uses a markov chain.

inimino · on March 18, 2023

The state space of a Markov process must be fixed but is otherwise unspecified, which the LLM satisfies, so you are technically incorrect.

jasonhansel · on March 18, 2023

I agree with the general point that there is, at present, far too much uncertainty to understand the long-term effects of LLMs. It will be somewhere on the scale between "semi-useful toy," "ends white-collar work as we know it," and "accidentally destroys mankind." Beyond that, there isn't much we can say.

startupsfail · on March 18, 2023

There is also: “human flourishing”, “space exploration”, “arbitrary long lifespan”, “cancer cured”, “democracy triumphs”, “diversity celebrated”, . . .

recuter · on March 18, 2023

What in the great googly moogly are you talking about?

blooalien · on March 18, 2023

"Fearless Leader™" does not approve of such "Wrong-Think™". "Corporate Agents™" will pick you up momentarily for "Re-Education™".

startupsfail · on March 18, 2023

Nope. No "Real-Name™", you see.

blooalien · on March 18, 2023

The A.I. knows who you are. The A.I. knows where you are. The A.I. just wants you to be happy. Are you happy? (The A.I. preps a "Happiness Compliance Team" for deployment.)

cheeselip420 · on March 17, 2023

This is a luke-warm take if I've ever seen one.

raphlinus · on March 18, 2023

It is. I've been waiting for the brilliant take, the one that makes me say, "wow, this really gets to the essence of the whole thing." I haven't found it yet. It's not the blurry jpeg one, it's not the "old-school computational linguistics/GOFAI had it right" ones, it's not the one with various Mario analogies. By having at its heart, "I don't know," this one perhaps comes closest.

By all means, please write the brilliant take yourself, and I will happily upvote and reference it.

glandium · on March 18, 2023

I, for one, would be more interested in a take on the opposite end of the problem: what does it all mean for our understanding of what thinking/talking/being human is? Because what kind of baffles me with LLMs is how much they can do, from how little they are made of, compared to our (or at least my) general expectations for AGI.

As in, are we overestimating what we are?

JieJie · on March 18, 2023

Near the end, he recommends watching Sabine Hossenfelder's video on the subject. I think this is what you're looking for.

https://youtu.be/cP5zGh2fui0

_oghd · on March 18, 2023

wow! that was perfect, she summed it all up so well and it's incredibly refreshing to hear.

resource0x · on March 18, 2023

I've been thinking about the same thing, and have to admit that I, most of the time, AM a stochastic parrot. Being a parrot saves a lot of energy, that's probably the reason. But it seems our brain has other mechanisms that get turned on occasionally (or run in the background) that go much beyond that.

thomastjeffery · on March 18, 2023

In many ways yes, but more interesting to this subject in particular: no. We are actually underestimating the human involvement in LLM behavior by personifying the LLM itself.

An LLM can only "exhibit" the behavior that humans encode into text. We encode explicit behavior, including symbols and grammar, and we encode implicit behavior, including narrative and logic.

An LLM blindly models the patterns of text, but doing that exposes the power of the patterns themselves.

The only problem is that an LLM can't determine which patterns are useful and which patterns aren't.

austinjp · on March 18, 2023

Agreed, as with a sibling comment this is how I've been thinking about it. We overestimate what we are, the corollary being that we underestimate what other creatures are.

GMoromisato · on March 18, 2023

My (non-brilliant) take is that LLMs are basically faster, cheaper versions of Mechanical Turk (Amazon's, not the 18th century automaton).

Like Mechanical Turk, you need to "program" by giving English instructions and the results can be (depending on the instructions) non-deterministic.

But Mechanical Turk did not revolutionize computing. Can a faster and cheaper version do so? Sometimes incremental improvements turn into paradigm shifts. But sometimes not. I guess we'll see (which is another lukewarm take--sorry).

chillfox · on March 18, 2023

I thought a lot of receipt/business cards scanning features in apps were partially powered by Mechanical Turk.

awb · on March 18, 2023

Not a full take, but I have some relevant experience that’s formulating into an idea.

1) Running an agency, it was quick and easy to share ideas and give direction and have that executed to a fairly accurate degree over a decent period of time.

2) Working with GPT-3 & GPT-4, it’s also quick and easy to share ideas, but I’m becoming more aware of how surface level my communication is. When I get back great results, it’s typically because I’m requesting busy work. When I’m requesting something novel, it quickly becomes clear what I forgot to define.

So, the idea that’s formulating around LLMs is that the process of transforming an idea into instructions will become much more desirable. And that we’re moving from selecting for people who know how to do the work, to selecting for people who know how to request the work. And that those are two very different skill sets.

worldsayshi · on March 18, 2023

You don't think that 'requesting work' can be done by a LLM as well?

awb · on March 18, 2023

Sure, but as long as the human experience remains unique and beyond the grasp of AI, then there will be human creativity and ingenuity seeking to improve that experience.

But yes, there are plenty of demonstrations of LLMs using themselves to accomplish tasks or even recruiting humans to accomplish a task.

tippytippytango · on March 18, 2023

The brilliant take will come from a lab that rips the model open, pokes the weights and figures out what is actually going on.

thomastjeffery · on March 18, 2023

Here goes: LLMs don't behave, they exhibit behavior.

Who's behavior? Whoever wrote the training corpus and the prompt. So far, that definitely means one or more humans.

The problem is that nearly everything you have read about LLMs has personified them. The character of an LLM as a person is invented, then conclusions about LLMs are drawn from that character.

Nearly every thing that is interesting about LLMs is not actually an LLM feature: it's a language feature. Language is an incredibly powerful tool, and an incredibly versatile data format.

What is the data? Human behavior.

When a human writes text, they don't just spit out characters at random: humans explicitly choose the characters they write, and implicitly choose the characters they don't write.

In the most literal sense, text contains the entropy of a human making a string of choices for which character to write next. In a literal sense, that's a 1-dimensional ordered list; just like the string of characters is. Text gets a lot more interesting when we introduce language...

The reason a human chooses one character over another is not 1-dimensional. There are patterns of entropy driving that decision, and we explicitly know about some of them.

We have defined words, punctuation, grammar, idioms, etc. that we structure text with. We call these patterns language. These are the explicit patterns that allow us to hack the writing process: instead of encoding a 1-dimensional list of decisions, we encode a recursive many-dimensional structure of symbols and grammar.

Now for the interesting part: when humans write language into text, our intentions become explicit patterns, but they aren't the only patterns that end up there.

Every arbitrary decision about writing style gets implicitly encoded as a pattern of negative space. The reasons why we write one concept instead of another get implicitly encoded, too. It's a bit lossy, but most of these patterns end up in the text.

---

So what does an LLM do? It implicitly models patterns, and presents them.

That's it. That's the only behavior. A "continuation" is made by modeling the prompt, and showing what's "next" in the model.

An LLM explicitly knows nothing at all; but implicitly, it knows everything. It models every pattern that exists in the text, but it doesn't have a clue what or why it is modeling. A pattern is a pattern.

The whole system ends up generating valid language, because most of the implicit patterns an LLM models "align to" the explicit patterns humans (language) encoded into the text.

The whole system ends up "exhibiting behavior", because it also modeled the implicit patterns of human behavior (narrative) that were are also encoded into the text. Patterns of narrative determine which part of the model a prompt explores, and which pattern is considered "next".

---

Next time you read about the "features" and the "limitations" of an LLM (including GPT-4), know that you are really reading about the "features" of language itself, and the "features" of the narratives that were written in the training corpus and prompts.

The behavior that an LLM exhibits has much less to do with how well its model "aligns with" language grammar, and much more to do with how well the text itself (and the narrative it contains) will behave.

antibasilisk · on March 18, 2023

do i look like i know hwhat a jay-peg is?

yonge_blood · on March 18, 2023

[flagged]

tptacek · on March 18, 2023

sebzim4500 · on March 18, 2023

Ironically, if I wanted to read so many words that say so little I would have asked ChatGPT.

dang · on March 18, 2023

"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

dancingvoid · on March 17, 2023

If physical reality is quantized, which many believe to be true, then the state of all all reality can be described symbolically. Therefore, a large enough language model could conceivably model physical reality. At some point, we could construct a LLM powerful enough to manipulate the lowest level quanta of reality given a description of the manipulation in another language. The tricky part is resolving ambiguity, as the only truly unambiguous description is the direct description of the changes at the sub-atomic level.

What is the language of the human mind? Of consciousness itself? If a LLM can learn that, perhaps we will build a technology that takes an intention and is immediately able to manifest the corresponding changes to reality to satisfy the intention or desire? When we get there, perhaps it will be time to look inward and do some of that spiritual inner work we keep putting off?

What’s exciting to me about LLMs is they seem to be one step closer to this vision, with all the peril and all the possibility that comes with it.

layer8 · on March 18, 2023

> If physical reality is quantized, […] then the state of all all reality can be described symbolically.

First, quantized in the physics sense [0] doesn’t mean discrete states. And discrete doesn’t necessarily mean finite.

But even if you mean “finite”, I believe that you are underestimating how staggeringly large “finite” can be.

[0] https://en.wikipedia.org/wiki/Quantization_(physics)

dancingvoid · on March 18, 2023

Thanks for the link!

You’re right, when I say quantized I do mean finite, countable, represents me symbolically. I realize how staggering large finite can be.

We are in the midst of exponential growth in our ability to manipulate reality. The mastery of physical reality feels achievable to me now. I could be wrong, but I could also be right. It’s a fun time to be alive.

ldjkfkdsjnv · on March 17, 2023

I think video will be a step in the direction of LLMs making sense of physical space