This research paper delves into length and compositional generalization in sequence-to-sequence models. These forms of out-of-distribution (OOD) generalization, crucial in AI, allow models to comprehend longer sequences and unseen token combinations beyond their training. The study examines various architectures, like deep sets, transformers, state space models, and simple recurrent neural nets. The results suggest that different degrees of representation identification, such as a linear or a permutation relation with the ground truth representation, are necessary for these types of generalization. While the models can achieve provable length and compositional generalization in specific scenarios, the research also highlights the need for a better understanding of their frequent failures to improve future designs.

 

Publication date: 2024-02-07
Project Page: https://arxiv.org/abs/2402.04875v1
Paper: https://arxiv.org/pdf/2402.04875