They are fancy markov chains in the same way that your computer is a finite state machine. It is technically true, but it is an extremely unhelpful way of thinking about them.
Alan Turing wouldn't think of a modern computer as a finite state machine either, he would think of it as a very good approximation to a turing machine.
Until it is helpful when you need to control the computer to process a sequence of data in a finite way similar to say a regular expression… #thisisfun
"All models are wrong, some are useful". You could also look at them as a highly compressed PGM representing every possible conversation you could have in every language the model knows that has been trimmed to only the highly probable paths through training. In that view word prediction is really navigating nodes and this would explain their ability to enforce global constraints as well as they do :) There isn't going to be one "correct" way to view them and different perspectives might offer different insights.
A Markov chain satisfies the Markov property [1] in which the probability of an event is conditioned only on the previous event. Regardless of whether an LLM event is a character or a word, an LLM certainly does not follow a Markov process.
If you consider the entire input for the language model as the state, and consider the output to be the input concatenated with the next token, then it's a markov chain.
But that's only if you don't use BEAM search, which LLMs do use. If you use BEAM search with beam=4 (typical for LLMs), for example, then it does a tree search where it keeps track of the top 4 highest probability outputs, then I guess it's not a markov chain process anymore but it just uses a markov chain.
I agree with the general point that there is, at present, far too much uncertainty to understand the long-term effects of LLMs. It will be somewhere on the scale between "semi-useful toy," "ends white-collar work as we know it," and "accidentally destroys mankind." Beyond that, there isn't much we can say.
The A.I. knows who you are. The A.I. knows where you are. The A.I. just wants you to be happy. Are you happy? (The A.I. preps a "Happiness Compliance Team" for deployment.)
It is. I've been waiting for the brilliant take, the one that makes me say, "wow, this really gets to the essence of the whole thing." I haven't found it yet. It's not the blurry jpeg one, it's not the "old-school computational linguistics/GOFAI had it right" ones, it's not the one with various Mario analogies. By having at its heart, "I don't know," this one perhaps comes closest.
By all means, please write the brilliant take yourself, and I will happily upvote and reference it.
I, for one, would be more interested in a take on the opposite end of the problem: what does it all mean for our understanding of what thinking/talking/being human is? Because what kind of baffles me with LLMs is how much they can do, from how little they are made of, compared to our (or at least my) general expectations for AGI.
I've been thinking about the same thing, and have to admit that I, most of the time, AM a stochastic parrot. Being a parrot saves a lot of energy, that's probably the reason.
But it seems our brain has other mechanisms that get turned on occasionally (or run in the background) that go much beyond that.
In many ways yes, but more interesting to this subject in particular: no. We are actually underestimating the human involvement in LLM behavior by personifying the LLM itself.
An LLM can only "exhibit" the behavior that humans encode into text. We encode explicit behavior, including symbols and grammar, and we encode implicit behavior, including narrative and logic.
An LLM blindly models the patterns of text, but doing that exposes the power of the patterns themselves.
The only problem is that an LLM can't determine which patterns are useful and which patterns aren't.
Agreed, as with a sibling comment this is how I've been thinking about it. We overestimate what we are, the corollary being that we underestimate what other creatures are.
My (non-brilliant) take is that LLMs are basically faster, cheaper versions of Mechanical Turk (Amazon's, not the 18th century automaton).
Like Mechanical Turk, you need to "program" by giving English instructions and the results can be (depending on the instructions) non-deterministic.
But Mechanical Turk did not revolutionize computing. Can a faster and cheaper version do so? Sometimes incremental improvements turn into paradigm shifts. But sometimes not. I guess we'll see (which is another lukewarm take--sorry).
Not a full take, but I have some relevant experience that’s formulating into an idea.
1) Running an agency, it was quick and easy to share ideas and give direction and have that executed to a fairly accurate degree over a decent period of time.
2) Working with GPT-3 & GPT-4, it’s also quick and easy to share ideas, but I’m becoming more aware of how surface level my communication is. When I get back great results, it’s typically because I’m requesting busy work. When I’m requesting something novel, it quickly becomes clear what I forgot to define.
So, the idea that’s formulating around LLMs is that the process of transforming an idea into instructions will become much more desirable. And that we’re moving from selecting for people who know how to do the work, to selecting for people who know how to request the work. And that those are two very different skill sets.
Sure, but as long as the human experience remains unique and beyond the grasp of AI, then there will be human creativity and ingenuity seeking to improve that experience.
But yes, there are plenty of demonstrations of LLMs using themselves to accomplish tasks or even recruiting humans to accomplish a task.
Here goes: LLMs don't behave, they exhibit behavior.
Who's behavior? Whoever wrote the training corpus and the prompt. So far, that definitely means one or more humans.
The problem is that nearly everything you have read about LLMs has personified them. The character of an LLM as a person is invented, then conclusions about LLMs are drawn from that character.
Nearly every thing that is interesting about LLMs is not actually an LLM feature: it's a language feature. Language is an incredibly powerful tool, and an incredibly versatile data format.
What is the data? Human behavior.
When a human writes text, they don't just spit out characters at random: humans explicitly choose the characters they write, and implicitly choose the characters they don't write.
In the most literal sense, text contains the entropy of a human making a string of choices for which character to write next. In a literal sense, that's a 1-dimensional ordered list; just like the string of characters is. Text gets a lot more interesting when we introduce language...
The reason a human chooses one character over another is not 1-dimensional. There are patterns of entropy driving that decision, and we explicitly know about some of them.
We have defined words, punctuation, grammar, idioms, etc. that we structure text with. We call these patterns language. These are the explicit patterns that allow us to hack the writing process: instead of encoding a 1-dimensional list of decisions, we encode a recursive many-dimensional structure of symbols and grammar.
Now for the interesting part: when humans write language into text, our intentions become explicit patterns, but they aren't the only patterns that end up there.
Every arbitrary decision about writing style gets implicitly encoded as a pattern of negative space. The reasons why we write one concept instead of another get implicitly encoded, too. It's a bit lossy, but most of these patterns end up in the text.
---
So what does an LLM do? It implicitly models patterns, and presents them.
That's it. That's the only behavior. A "continuation" is made by modeling the prompt, and showing what's "next" in the model.
An LLM explicitly knows nothing at all; but implicitly, it knows everything. It models every pattern that exists in the text, but it doesn't have a clue what or why it is modeling. A pattern is a pattern.
The whole system ends up generating valid language, because most of the implicit patterns an LLM models "align to" the explicit patterns humans (language) encoded into the text.
The whole system ends up "exhibiting behavior", because it also modeled the implicit patterns of human behavior (narrative) that were are also encoded into the text. Patterns of narrative determine which part of the model a prompt explores, and which pattern is considered "next".
---
Next time you read about the "features" and the "limitations" of an LLM (including GPT-4), know that you are really reading about the "features" of language itself, and the "features" of the narratives that were written in the training corpus and prompts.
The behavior that an LLM exhibits has much less to do with how well its model "aligns with" language grammar, and much more to do with how well the text itself (and the narrative it contains) will behave.
If physical reality is quantized, which many believe to be true, then the state of all all reality can be described symbolically. Therefore, a large enough language model could conceivably model physical reality. At some point, we could construct a LLM powerful enough to manipulate the lowest level quanta of reality given a description of the manipulation in another language. The tricky part is resolving ambiguity, as the only truly unambiguous description is the direct description of the changes at the sub-atomic level.
What is the language of the human mind? Of consciousness itself? If a LLM can learn that, perhaps we will build a technology that takes an intention and is immediately able to manifest the corresponding changes to reality to satisfy the intention or desire? When we get there, perhaps it will be time to look inward and do some of that spiritual inner work we keep putting off?
What’s exciting to me about LLMs is they seem to be one step closer to this vision, with all the peril and all the possibility that comes with it.
You’re right, when I say quantized I do mean finite, countable, represents me symbolically. I realize how staggering large finite can be.
We are in the midst of exponential growth in our ability to manipulate reality. The mastery of physical reality feels achievable to me now. I could be wrong, but I could also be right. It’s a fun time to be alive.
But they are fancy expensive Markov chains. The learning here is that fancy expensive Markov chains can do a lot, not that the statement is wrong.