The article presents CompactifAI, a novel compression method for Large Language Models (LLMs) like ChatGPT and LlaMA. LLMs are advancing rapidly in AI, but their large size leads to significant challenges like high training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods focus on reducing the number of neurons or the numerical precision of individual weights. CompactifAI, however, focuses on the model’s correlation space, allowing for more controlled, refined, and interpretable model compression. It can also work with or on top of other compression techniques. The article demonstrates that CompactifAI can compress the LlaMA-2 7B model to only 30% of its original size while recovering over 90% of the original accuracy after a brief distributed retraining.

 

Publication date: 26 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.14109