The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

The paper investigates the impact of preference agreement on the efficacy of Reinforcement Learning from Human Feedback (RLHF) in text summarization. The authors demonstrate that including a diverse range of annotator agreement in human preferences leads to more accurate reward models and alters the quality characteristics captured. The findings also indicate improvements in downstream generation when using a reward model trained with a range of preference agreements. This has implications for the design of synthetic datasets and underscores the importance of considering quality differentials in comparison-based data.

Publication date: 2 Nov 2023
Project Page: https://arxiv.org/abs/2311.04919
Paper: https://arxiv.org/pdf/2311.04919

Post Views: 298

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Leveraging Large Language Models for Collective Decision-Making

Chain of Empathy: Enhancing Empathetic Response of Large Language Models Based on Psychotherapy Models

Leave a Reply Cancel reply

Please allow ads on our site