The academic article focuses on image-text matching, a problem that has gained attention in academia and industry. The authors identify a prevalent issue in this field, the noisy correspondence (NC), which reduces the performance of existing methods. To address this, they propose a new framework called Cross-modal Robust Complementary Learning (CRCL) that employs an Active Complementary Loss (ACL) and a Self-refining Correspondence Correction (SCC) to improve the robustness of existing methods. The proposed framework is tested on three image-text benchmarks, demonstrating superior robustness against synthetic and real-world noisy correspondences.

 

Publication date: 26 Oct 2023
Project Page: https://github.com/QinYang79/CRCL
Paper: https://arxiv.org/pdf/2310.17468