The study addresses the challenge of efficiently combining foundational models with more specific models to enhance their capabilities. CALM Composition is proposed, which introduces cross-attention between models to compose their representations and enable new capabilities. The method scales up Large Language Models (LLMs) on new tasks by re-using existing LLMs with a few additional parameters and data. The existing model weights are kept intact to preserve the existing capabilities. The approach is applicable to diverse domains and settings. The researchers demonstrated that augmenting a model with a smaller one trained on low-resource languages results in significant improvement on tasks like translation into English and arithmetic reasoning for low-resource languages.

 

Publication date: 4 Jan 2024
Project Page: https://arxiv.org/abs/2401.02412
Paper: https://arxiv.org/pdf/2401.02412