In this paper, the authors introduce Semantic-SAM, a universal image segmentation model designed to enable segmenting and recognizing objects at any desired granularity. The model offers two key advantages: semantic-awareness and granularity-abundance. It consolidates multiple datasets across granularities and trains on decoupled objects and parts classification, facilitating knowledge transfer among rich semantic information. For the multi-granularity capability, a multi-choice learning scheme is proposed, enabling each click point to generate masks at multiple levels that correspond to multiple ground-truth masks. The model successfully achieves semantic-awareness and granularity-abundance, and combining SA-1B training with other segmentation tasks leads to performance improvements.
Publication date: July 10, 2023
Project Page: https://github.com/UX-Decoder/Semantic-SAM
Paper: https://arxiv.org/abs/2307.04767