1. make your own RLHF dataset - like OpenAI and Open Assistant
2. exfiltrate data from a bigger/better LLM - Vicuna & family
3. use your pre-trained LLM to generate RLAIF data, no leeching - ConstitutionalAI, based on a set of rules instead of labelling examples
https://arxiv.org/abs/2305.13735
https://arxiv.org/abs/2305.11206
1. make your own RLHF dataset - like OpenAI and Open Assistant
2. exfiltrate data from a bigger/better LLM - Vicuna & family
3. use your pre-trained LLM to generate RLAIF data, no leeching - ConstitutionalAI, based on a set of rules instead of labelling examples