The article presents a new method for Text-based Person Search (TBPS), which is a task of retrieving images of people based on textual descriptions. The proposed method, called Vision-Guided Semantic-Group Network (VGSG), aims to extract well-aligned fine-grained visual and textual features more efficiently and without the need for external tools. The VGSG includes a Semantic-Group Textual Learning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module to extract textual local features guided by visual local clues. The authors claim that their method outperforms existing ones based on experimental results on two benchmarks.
Publication date: 14 Nov 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.07514