Are llms truly non-deterministic? My impression was that we inject some randomne...

lsy · 2025-03-28T18:04:16 1743185056

In addition to Kamshak's note about parallel inference accumulating float errors differently due to order of operations, which makes LLMs theoretically non-deterministic at temperature 0, there is the issue of them being practically non-deterministic as-deployed, not just via temperature but because of inclusion of prior "turns" in context, variations in phrasing of prompts, etc.

It's also "non-deterministic" in the sense that if you removed all sources of non-determinism and asked "What is 1+1?" and received the answer "2" deterministically, that doesn't guarantee a correct answer for "What is 1+2?". Ie a variation in the input isn't correlated in a logical way with a variation in the output, which is somewhat fatal for computer programs, where the goal is to generalize a problem across a range of inputs.

Kamshak · 2025-03-28T17:26:43 1743182803

There is also unintentional randomness due to the parallelism in inference (e.g. parallel matmuls added together on the GPU). Since it's multiplying floats every operation has rounding drift that accumulates differently depending on the order of operations. So even at temperature 0 you're not getting deterministic outputs

naveen99 · 2025-03-29T18:02:13 1743271333

Because addition and multiplication are not associative with floats ?

CuriouslyC · 2025-03-28T16:28:02 1743179282

The probability distribution over next tokens given previous tokens is deterministic. The sampling algorithm for that distribution is non-deterministic.

krallistic · 2025-03-28T16:35:52 1743179752

And sampling from a (now fixed) distribution can be made deterministic...

So the total generation of text from an LLM can be made fully deterministic. The problem for scientists is that we cant do that in the deployed systems...

CuriouslyC · 2025-03-28T17:05:58 1743181558

You can set the temperature to zero in most APIs, which gives deterministic output. The only problem with that is some models produce inferior results with zero temperature, including lots of slop and AI-isms.