> The traditional way to do RAG is to find information relevant to a query - and then incorporate it into the LLM prompt together with the question we want it to answer.
Technically this is incorrect. The original RAG paper used a seq2seq generator (BART) and involved two methods: RAG sequence and RAG token.
RAG sequence used the same fixed documents and appended them to the input query (note, this is different from a decoder-only model). RAG token generates each token based on a different document.
I only nitpick this because if someone is going to invent new fancy-sounding variants of RAG they should at least get the basics right.
Technically this is incorrect. The original RAG paper used a seq2seq generator (BART) and involved two methods: RAG sequence and RAG token.
RAG sequence used the same fixed documents and appended them to the input query (note, this is different from a decoder-only model). RAG token generates each token based on a different document.
I only nitpick this because if someone is going to invent new fancy-sounding variants of RAG they should at least get the basics right.