MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

The paper presents MosaicFusion, a data augmentation approach that uses diffusion models for large vocabulary instance segmentation. This method is training-free and does not rely on label supervision. It uses a text-to-image diffusion model as a dataset generator for object instances and mask annotations. The method divides an image canvas into several regions and performs a diffusion process to generate multiple instances simultaneously, based on different text prompts. It also obtains corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. MosaicFusion can improve the performance of existing instance segmentation models, especially for rare and novel categories.

Publication date: 22 Sep 2023
Project Page: https://github.com/Jiahao000/MosaicFusion
Paper: https://arxiv.org/pdf/2309.13042

Post Views: 316

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints

NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields

Leave a Reply Cancel reply

Please allow ads on our site