Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are three ways

1. make your own RLHF dataset - like OpenAI and Open Assistant

2. exfiltrate data from a bigger/better LLM - Vicuna & family

3. use your pre-trained LLM to generate RLAIF data, no leeching - ConstitutionalAI, based on a set of rules instead of labelling examples



I wonder whether these approaches fit into the above categories:

https://arxiv.org/abs/2305.13735

https://arxiv.org/abs/2305.11206




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: