CLIM: Contrastive Language-Image Mosaic for Region Representation

This academic article presents a novel approach named Contrastive Language-Image Mosaic (CLIM) for aligning region and text representations in object detection. This method effectively utilizes large-scale image-text pairs, combining multiple images into a mosaicked image. Each image is treated as a ‘pseudo region’, and the feature of each pseudo region is trained to be similar to the corresponding text embedding, enabling the model to learn the region-text alignment without expensive box annotations. The experimental results show that CLIM significantly improves open-vocabulary object detectors.

Publication date: 19 Dec 2023
Project Page: https://github.com/wusize/CLIM
Paper: https://arxiv.org/pdf/2312.11376

Post Views: 313

CLIM: Contrastive Language-Image Mosaic for Region Representation

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Orientation-Constrained System for Lamp Detection in Buildings Based on Computer Vision

Use of BIM Data as Input and Output for Improved Detection of Lighting Elements in Buildings

Leave a Reply Cancel reply

Please allow ads on our site