ReMax: A Simple, Effective, and Efficient Method for Aligning Large Language Models

The paper discusses the challenges of training large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF). The authors identify three important properties in RLHF tasks: fast simulation, deterministic transitions, and trajectory-level rewards, which aren’t leveraged in the current popular algorithm, PPO. To address this, they propose a new algorithm, ReMax, which is simpler, more efficient, and uses less memory than PPO. ReMax also doesn’t sacrifice performance for these improvements. The authors suggest that these benefits can be maintained in larger-scale models.

Publication date: 16 Oct 2023
Project Page: https://github.com/liziniu/ReMax
Paper: https://arxiv.org/pdf/2310.10505

Post Views: 327

ReMax: A Simple, Effective, and Efficient Method for Aligning Large Language Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Comparing Comparators in Generalization Bounds

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

Leave a Reply Cancel reply

Please allow ads on our site