Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If there is an aspect of the world that isn't covered by the training text, the model cannot divine it. It is just correlating text to other text.

You could say the same for human perception, though. We don't actually "see" what our brain thinks we see, our brain fills in the image. Nonetheless, it doesn't stop us from having and using our slightly-inaccurate world-model based on this info. Also, we can often infer our faulty perceptions by comparative methods like "banana for scale" or looking for conflicts and contradictions.



I think there is a pretty sizeable difference between "human perception of the world is necessarily mediated by sense organs" and "all human textual output exhaustively covers the world and can be used to comprehensively describe and navigate it". Unless we are solipsists, we tend to agree that the world maintains a presence whether we are sensing it or not, and that even our senses are merely a limited reflection of what the world is and not the ground truth; that when our ideas or senses don't match the world, the world wins. For an LLM however there is nothing outside of text.


First, GPT-4 was trained on images and text, not just text. The images improved its text predictions. Because it has a world model, and they helped populate it. It outputs only text. But just like a Unix program writing text to stdout, nothing about the fact that your output format is constrained restricts the kind of computation you can perform in service of that output.

  Input: text and images
  Computation: ?
  Output: text
I think you're suggesting/asserting that the computation step -- the hidden layers -- must also be focused on text. But there's no such constraint in reality.

I don't think it's such a stretch to see the billions of written words given to GPT-4 as essentially a new kind of sense organ. They make it capable of rejecting the untrue claims made in its training set, because (a) the untrue claims are massively overwhelmed in number by the true claims, and (b) the true claims usually come with links to other knowledge that reinforces them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: