There is also unintentional randomness due to the parallelism in inference (e.g. parallel matmuls added together on the GPU). Since it's multiplying floats every operation has rounding drift that accumulates differently depending on the order of operations. So even at temperature 0 you're not getting deterministic outputs