> If there is an aspect of the world that isn't covered by the training text, th...

lsy · on May 22, 2023

I think there is a pretty sizeable difference between "human perception of the world is necessarily mediated by sense organs" and "all human textual output exhaustively covers the world and can be used to comprehensively describe and navigate it". Unless we are solipsists, we tend to agree that the world maintains a presence whether we are sensing it or not, and that even our senses are merely a limited reflection of what the world is and not the ground truth; that when our ideas or senses don't match the world, the world wins. For an LLM however there is nothing outside of text.

cjbprime · on May 23, 2023

First, GPT-4 was trained on images and text, not just text. The images improved its text predictions. Because it has a world model, and they helped populate it. It outputs only text. But just like a Unix program writing text to stdout, nothing about the fact that your output format is constrained restricts the kind of computation you can perform in service of that output.

  Input: text and images
  Computation: ?
  Output: text

I think you're suggesting/asserting that the computation step -- the hidden layers -- must also be focused on text. But there's no such constraint in reality.

I don't think it's such a stretch to see the billions of written words given to GPT-4 as essentially a new kind of sense organ. They make it capable of rejecting the untrue claims made in its training set, because (a) the untrue claims are massively overwhelmed in number by the true claims, and (b) the true claims usually come with links to other knowledge that reinforces them.