This academic paper introduces DeAL, a framework for Decoding-time Alignment of Large Language Models (LLMs). Current techniques focus on aligning these models with human preferences at training time using Reinforcement Learning with Human Feedback (RLHF). However, these methods have limitations and may not be effective. The proposed DeAL framework allows the user to customize reward functions and enables alignment of LLMs during the decoding process. This approach can deal with fine-grained trade-offs and improve adherence to alignment objectives.

 

Publication date: 12 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.06147