This paper discusses the challenge of Transformer models’ performance decay on inputs longer than those used during training. It introduces a novel functional relative position encoding with progressive interpolation, FIRE, to enhance Transformer’s generalization to longer contexts. The paper proves that FIRE can represent popular relative position encodings like T5’s RPE, Alibi, and Kerple. Empirical evidence shows that FIRE models have better length generalization on both zero-shot language modeling and long text benchmarks.

 

Publication date: 9 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.04418