I'm not so sure that view is very widespread amongst people familiar with how LL...

imtringued · on July 21, 2024

The problem with <aloud></aloud> is that you need the internal monologue to not be subject to training loss, otherwise the internal monologue is restricted to the training distribution.

Something people don't seem to grasp is that the training data mostly doesn't contain any reasoning. Nobody has published brain activity recordings on the internet, only text written in human language.

People see information, process it internally in their own head which is not subject to any outside authority and then serialize the answer to human language, which is subject to outside authorities.

Think of the inverse. What if school teachers could read the thoughts of their students and punish any student that thinks the wrong thoughts. You would expect the intelligence of the class to rapidly decline.

skybrian · on July 21, 2024

That does sounds invasive, but on the other hand, math teachers do tell the kids to “show their work” for good reasons. And the consent issues don’t apply for LLM training.

I wonder if the trend towards using synthetic, AI-generated training data will make it easier to train models that use <aloud> effectively? AI’s could be trained to use reasoning and show their work more than people normally do when posting on the Internet. It’s not going to create information out of nothing, but it will better model the distribution that the researchers want the LLM to have, rather than taking distributions found on the Internet as given.

It’s not a natural distribution anyway. For example, I believe it’s already the case that people train AI with weighted distributions - training more on Wikipedia, for example.

My guess is that the quest for the best training data has only just begun.

Lerc · on July 21, 2024

I think you are looking at a too narrowly defined avenue to achieve this effect.

There are multiple avenues to train a model to do this. Most simply is a finetune on training examples where the the internal monologue is constructed in a manner that precedes the <aloud> tag and provides additional reasoning before the output.

I think there is also scope for pretraining with a mask to not attempt to predict (or ignore the loss, same thing) certain things in the stream. For example to give time codes into the data stream. The training could then have an awareness of the passing of time but would not generate time codes as a prediction. Time codes could then be injected into the context at inference time and it would be able to use that data.

lucianbr · on July 21, 2024

Has anyone done the <aloud> thing, and achieved some interesting results? Seems a pretty obvious thing to try, but I never heard of anything like it.

thomashop · on July 21, 2024

I've seen automated AI agents that can spend time reflecting on themselves in a feedback loop. The model alters its state over time and can call APIs.

You could equate saying something "aloud" to calling an API.

Lerc · on July 21, 2024

I noticed some examples from anthropic's golden-gate-claude paper had responses starting with <scratchpad> for the inverse effect. Suppressing the output to the end of the paragraph would be an easy post processing operation.

It's probably better to have implicitly closed tags rather than requiring a close tag. It would be quite easy for a LLM to miss a close tag and be off in a dreamland.

Possibly addressing comments to the user or itself might allow for considering multiple streams of thought simultaneously. IRC logs would be decent training data for it to figure out many voice multi-conversations (maybe)