That is, openAI paid people to chat with their LLM to fine tune it and then other LLMs use chatgpt to generate training data to align their models.
1. make your own RLHF dataset - like OpenAI and Open Assistant
2. exfiltrate data from a bigger/better LLM - Vicuna & family
3. use your pre-trained LLM to generate RLAIF data, no leeching - ConstitutionalAI, based on a set of rules instead of labelling examples
https://arxiv.org/abs/2305.13735
https://arxiv.org/abs/2305.11206
That is, openAI paid people to chat with their LLM to fine tune it and then other LLMs use chatgpt to generate training data to align their models.