This article introduces L4Q, a novel algorithm for parameter-efficient quantization-aware training on Large Language Models (LLMs). L4Q aims to improve the generality of these models using a low-rank adaptation (LoRA) approach. It addresses challenges associated with non-linearly quantized or mixed-precision weights, which can impede optimal performance. Experiments conducted on the LLaMA and LLaMA2 model families show that L4Q successfully enhances language comprehension and learning, achieving high precision while maintaining comparable training times.
Publication date: 7 Feb 2024
Project Page: https://arxiv.org/pdf/2402.04902v1.pdf
Paper: https://arxiv.org/pdf/2402.04902