The study investigates the potential of transferring ‘complementary’ knowledge from one pretrained model to another without performance degradation. This task is complex as additional knowledge can be contained in stronger, equal, or weaker models. The research first reveals the shortcomings of standard knowledge distillation techniques, then proposes a more general extension through data partitioning for successful transfer between nearly all pretrained models, which can also be done unsupervised. The study provides an initial, in-depth exploration on the viability of such general-purpose knowledge transfer.

 

Publication date: 26 Oct 2023
Project Page: https://arxiv.org/abs/2310.17653
Paper: https://arxiv.org/pdf/2310.17653