This paper introduces a novel approach to Deep Metric Learning (DML) models, which are widely used in similarity-based computer vision tasks. The authors propose a new method that incorporates a Soft Orthogonality (SO) constraint on proxies. This constraint ensures that the proxies are as orthogonal as possible, controlling their positions in the embedding space. The method uses a Data-Efficient Image Transformer (DeiT) as an encoder to extract contextual features from images along with a DML objective. The objective is made of the Proxy Anchor loss along with the SO regularization. The proposed approach is evaluated on four public benchmarks for category-level image retrieval and demonstrates its effectiveness with comprehensive experimental results and ablation studies.

Publication date: June 23, 2023
Project Page: N/A
Paper: https://arxiv.org/pdf/2306.13055.pdf