Functional Interpolation for Relative Positions Improves Long Context Transformers
This paper discusses the challenge of Transformer models’ performance decay on inputs longer than those used during training. It introduces a novel functional relative position encoding with progressive interpolation, FIRE,…
Continue reading