TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

The article introduces TinyCLIP, a novel cross-modal distillation method for large-scale language-image pre-trained models like CLIP. The method uses two core techniques: affinity mimicking and weight inheritance. Affinity mimicking allows student models to mimic teachers’ behavior of learning cross-modal feature alignment in a visual-linguistic affinity space. Weight inheritance transmits the pre-trained weights from the teacher models to their student counterparts to improve distillation efficiency. The TinyCLIP method can reduce the size of the pre-trained CLIP ViT-B/32 by 50% while maintaining comparable zero-shot performance. The TinyCLIP ViT-8M/16 model, trained on YFCC-15M, surpasses the original CLIP ViT-B/16 by 3.5% while using only 8.9% parameters.

Publication date: 22 Sep 2023
Project Page: aka.ms/tinyclip
Paper: https://arxiv.org/pdf/2309.12314

Post Views: 327

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Active Stereo Without Pattern Projector

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Leave a Reply Cancel reply

Please allow ads on our site