This paper explores the potential to compress Large Language Models (LLMs) using Low Rank Decomposition (LoRD). The researchers found that ranks for linear layers in these models can be reduced by up to 39.58% with less than 1% increase in perplexity. The compressed models speed up inference by up to 22.35%. The LoRD models remain compatible with state-of-the-art near-lossless quantization methods such as SpQR, which allows further compression gains of quantization. The study shows LoRD as a promising new paradigm for LLM compression.

 

Publication date: 25 Sep 2023
Project Page: https://huggingface.co/nolanoAI
Paper: https://arxiv.org/pdf/2309.14021