Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Preference learning aims to align the generations of large language models (LLMs) with human preferences. Most previous work focuses on in-distribution preference learning, but this research addresses out-of-distribution (OOD) preference…
Continue reading