L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ

This article introduces L4Q, a novel algorithm for parameter-efficient quantization-aware training on Large Language Models (LLMs). L4Q aims to improve the generality of these models using a low-rank adaptation (LoRA) approach. It addresses challenges associated with non-linearly quantized or mixed-precision weights, which can impede optimal performance. Experiments conducted on the LLaMA and LLaMA2 model families show that L4Q successfully enhances language comprehension and learning, achieving high precision while maintaining comparable training times.

Publication date: 7 Feb 2024
Project Page: https://arxiv.org/pdf/2402.04902v1.pdf
Paper: https://arxiv.org/pdf/2402.04902

Post Views: 307

Generative Large Language Models, L4Q, Low-Rank Adaptation, Parameter Efficient, Quantization-Aware Training

L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Conformal Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects

On Provable Length and Compositional Generalization

Leave a Reply Cancel reply

Please allow ads on our site