Text Representation Distillation via Information Bottleneck Principle

The article discusses the challenges of high computational cost and high-dimensional representation in pre-trained language models (PLMs) for text representation. To overcome these issues, the authors propose a novel Knowledge Distillation method, IBKD, inspired by the Information Bottleneck principle. This method aims to maximize the mutual information between the final representation of the teacher and student model while reducing the mutual information between the student model’s representation and the input data. This approach helps the student model to retain crucial information and avoid unnecessary information, decreasing the risk of overfitting. The effectiveness of the proposed method is demonstrated through empirical studies on Semantic Textual Similarity and Dense Retrieval tasks.

Publication date: 9 Nov 2023
Project Page: https://github.com/Alibaba-NLP/IBKD
Paper: https://arxiv.org/pdf/2311.05472

Post Views: 329

Press ESC to close

Share Article:

root

Towards End-to-End Spoken Grammatical Error Correction

All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation

Please allow ads on our site