ODIN (Omni-Dimensional INstance segmentation) is a model proposed to challenge the belief that 2D and 3D perception require distinct model architectures. ODIN can segment and label both 2D RGB images and 3D point clouds using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. The model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved. The model shows state-of-the-art performance on several 3D instance segmentation benchmarks and outperforms all previous works when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh.
Publication date: 5 Jan 2024
Project Page: https://odin-seg.github.io
Paper: https://arxiv.org/pdf/2401.02416