If you consider the entire input for the language model as the state, and consid...

If you consider the entire input for the language model as the state, and consider the output to be the input concatenated with the next token, then it's a markov chain.

But that's only if you don't use BEAM search, which LLMs do use. If you use BEAM search with beam=4 (typical for LLMs), for example, then it does a tree search where it keeps track of the top 4 highest probability outputs, then I guess it's not a markov chain process anymore but it just uses a markov chain.