LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

This paper explores the potential to compress Large Language Models (LLMs) using Low Rank Decomposition (LoRD). The researchers found that ranks for linear layers in these models can be reduced by up to 39.58% with less than 1% increase in perplexity. The compressed models speed up inference by up to 22.35%. The LoRD models remain compatible with state-of-the-art near-lossless quantization methods such as SpQR, which allows further compression gains of quantization. The study shows LoRD as a promising new paradigm for LLM compression.

Publication date: 25 Sep 2023
Project Page: https://huggingface.co/nolanoAI
Paper: https://arxiv.org/pdf/2309.14021

Post Views: 338

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges

Multiple evolutionary pressures shape identical consonant avoidance in the world’s languages

Please allow ads on our site