This study revisits the issue of pruning large language models (LLMs), particularly from the BERT family, in response to the Sparsity May Cry (SMC) benchmark. The benchmark highlights the complexity of pruning, as many existing methods seem to fail. The authors propose a set of strategies to achieve successful pruning, which includes a cost-benefit analysis of pruning model components, a method for scaling training and learning rate schedules, and the importance of proper parameterization for Knowledge Distillation in LLMs. The insights lead to state-of-the-art results on both classic BERT-pruning benchmarks and the SMC benchmark.

 

Publication date: 21 Dec 2023
Project Page: https://arxiv.org/abs/2312.13547v1
Paper: https://arxiv.org/pdf/2312.13547