Demystifying CLIP Data

The paper focuses on the Contrastive Language-Image Pre-training (CLIP) technique used in computer vision. The authors argue that the success of CLIP lies in its data, not its architecture or pre-training objective. However, CLIP provides limited information about its data and how it’s collected. The authors aim to reveal CLIP’s data curation approach and introduce MetaCLIP, a method that uses a raw data pool and metadata to yield a balanced subset over the metadata distribution. The study shows that MetaCLIP outperforms CLIP’s data on multiple benchmarks and achieves better accuracy. The curation code and training data distribution on metadata are made available for the community.

Publication date: 28 Sep 2023
Project Page: https://github.com/facebookresearch/MetaCLIP
Paper: https://arxiv.org/pdf/2309.16671

Post Views: 350

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Learning to Transform for Generalizable Instance-wise Invariance

Decaf: Monocular Deformation Capture for Face and Hand Interactions

Leave a Reply Cancel reply

Please allow ads on our site