This research paper discusses the limitations of autoregressive large language models (LLMs) in compressing knowledge from training data. The authors propose a solution by using amortized Bayesian inference to sample from intractable posterior distributions. The process is achieved by fine-tuning LLMs using diversity-seeking reinforcement learning algorithms. The authors demonstrate that this approach enables efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.

 

Publication date: 6 Oct 2023
Project Page: https://github.com/GFNOrg/gfn-lm-tuning
Paper: https://arxiv.org/pdf/2310.04363