The research paper investigates the possibility of transferring ‘complementary’ knowledge between any arbitrary pairing of pretrained models, without any performance degradation. The authors found that each model learns unique feature sets from the data due to training variations. They also observed that one model can extract significant data context that is not available in the other, regardless of their overall performance. The study reveals the limitations of standard knowledge distillation techniques and proposes a more generalized extension through data partitioning for successful knowledge transfer between nearly all pretrained models. The findings suggest that facilitating robust transfer could unlock auxiliary gains and knowledge fusion from any model repository.
Publication date: 26 Oct 2023
Project Page: https://arxiv.org/abs/2310.17653
Paper: https://arxiv.org/pdf/2310.17653