The study presents InstaGen, a novel method to enhance object detector’s ability by training on synthetic dataset generated from diffusion models. The authors integrated an instance-level grounding head into a pre-trained generative diffusion model to localize arbitrary instances in the generated images. The grounding head aligns the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector and a novel self-training scheme on categories not covered by the detector. Experimental results show that InstaGen can serve as a data synthesizer, enhancing object detectors by training on its generated samples, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary and data-sparse scenarios.
Publication date: 8 Feb 2024
Project Page: https://fcjian.github.io/InstaGen
Paper: https://arxiv.org/pdf/2402.05937