The article introduces LLaMA Pro, a novel post-pretraining method for Large Language Models (LLMs) that expands Transformer blocks. Unlike other LLMs, which can forget old skills when acquiring new ones, LLaMA Pro improves the model’s knowledge efficiently and effectively without catastrophic forgetting. The method was tested on a corpus of code and math, resulting in LLaMA Pro-8.3B, a versatile model excelling in general tasks, programming, and mathematics. The model and its instruction-following counterpart, LLaMA Pro-Instruct, demonstrated advanced performance, surpassing existing models in the LLaMA family and showing potential as an intelligent agent. The findings provide insights into integrating natural and programming languages, laying a foundation for developing advanced language agents.

 

Publication date: 4 Jan 2024
Project Page: https://github.com/TencentARC/LLaMA-Pro
Paper: https://arxiv.org/pdf/2401.02415