The article discusses the challenges of high computational cost and high-dimensional representation in pre-trained language models (PLMs) for text representation. To overcome these issues, the authors propose a novel Knowledge Distillation method, IBKD, inspired by the Information Bottleneck principle. This method aims to maximize the mutual information between the final representation of the teacher and student model while reducing the mutual information between the student model’s representation and the input data. This approach helps the student model to retain crucial information and avoid unnecessary information, decreasing the risk of overfitting. The effectiveness of the proposed method is demonstrated through empirical studies on Semantic Textual Similarity and Dense Retrieval tasks.

 

Publication date: 9 Nov 2023
Project Page: https://github.com/Alibaba-NLP/IBKD
Paper: https://arxiv.org/pdf/2311.05472