Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
The article introduces Q-probing, a method that adapts a pre-trained language model to maximize a task-specific reward function. This approach sits between heavier methods like finetuning and lighter ones like…
Continue reading