LLaMA Pro: Progressive LLaMA with Block Expansion

The article introduces LLaMA Pro, a novel post-pretraining method for Large Language Models (LLMs) that expands Transformer blocks. Unlike other LLMs, which can forget old skills when acquiring new ones, LLaMA Pro improves the model’s knowledge efficiently and effectively without catastrophic forgetting. The method was tested on a corpus of code and math, resulting in LLaMA Pro-8.3B, a versatile model excelling in general tasks, programming, and mathematics. The model and its instruction-following counterpart, LLaMA Pro-Instruct, demonstrated advanced performance, surpassing existing models in the LLaMA family and showing potential as an intelligent agent. The findings provide insights into integrating natural and programming languages, laying a foundation for developing advanced language agents.

Publication date: 4 Jan 2024
Project Page: https://github.com/TencentARC/LLaMA-Pro
Paper: https://arxiv.org/pdf/2401.02415

Post Views: 279

LLaMA Pro: Progressive LLaMA with Block Expansion

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review

TinyLlama: An Open-Source Small Language Model

Leave a Reply Cancel reply

Please allow ads on our site