Theoretical guarantees on the best-of-n alignment policy
The paper discusses the best-of-n policy used for aligning generative models. It disproves a common claim that the KL divergence between the best-of-n policy and the base policy is equal…
Continue reading