I'm reminded of the Physics of Language Models[1] where they showed a standatd a...

		magicalhippo 10 months ago \| parent \| context \| favorite \| on: Block Diffusion: Interpolating between autoregress... I'm reminded of the Physics of Language Models[1] where they showed a standatd autoregressive LLM got a lot more accurate if the models got access to the backspace key, so to speak. [1]: https://physics.allen-zhu.com/home