Transforming and Combining Rewards for Aligning Large Language Models
The study examines two problems in aligning language models to human preferences using reward models. Firstly, it questions if a certain monotone transformation of the reward model can preserve preference…
Continue reading