Dense Reward for Free in Reinforcement Learning from Human Feedback

The study focuses on Reinforcement Learning from Human Feedback (RLHF) and how it can be optimized. Traditionally, RLHF involves generating completions from a language model in response to a query and then assigning a score to the full completion using a separate reward model. The researchers propose a new method that uses the attention weights from the transformer architecture of the reward model to redistribute the reward along the whole completion, effectively densifying the signal and highlighting the most important tokens. This approach stabilises training, accelerates the rate of learning, and may lead to better local optima.

Publication date: 1 Feb 2024
Project Page: https://arxiv.org/abs/2402.00782v1
Paper: https://arxiv.org/pdf/2402.00782

Post Views: 289

Dense Reward for Free in Reinforcement Learning from Human Feedback

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Control-Theoretic Techniques for Online Adaptation of Deep Neural Networks in Dynamical Systems

Leave a Reply Cancel reply

Please allow ads on our site