Language-conditioned Detection Transformer

This paper introduces DECOLA, a new open-vocabulary detection framework that uses both image-level labels and detailed detection annotations. The framework works in three steps: training a language-conditioned object detector, pseudo-labeling images, and training an unconditioned open-vocabulary detector on the pseudo-annotated images. DECOLA shows strong performance in zero-shot scenarios. It outperforms previous approaches by providing more accurate pseudo-labels due to its conditioning mechanism. This approach achieves state-of-the-art results across various model sizes, architectures, and datasets.

Publication date: 29 Nov 2023
Project Page: https://github.com/janghyuncho/DECOLA
Paper: https://arxiv.org/pdf/2311.17902

Post Views: 303

Language-conditioned Detection Transformer

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting

SODA: Bottleneck Diffusion Models for Representation Learning

Leave a Reply Cancel reply

Please allow ads on our site