ODIN: A Single Model for 2D and 3D Perception

ODIN (Omni-Dimensional INstance segmentation) is a model proposed to challenge the belief that 2D and 3D perception require distinct model architectures. ODIN can segment and label both 2D RGB images and 3D point clouds using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. The model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved. The model shows state-of-the-art performance on several 3D instance segmentation benchmarks and outperforms all previous works when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh.

Publication date: 5 Jan 2024
Project Page: https://odin-seg.github.io
Paper: https://arxiv.org/pdf/2401.02416

Post Views: 317

ODIN: A Single Model for 2D and 3D Perception

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

On Time-Indexing as Inductive Bias in Deep RL for Sequential Manipulation Tasks

Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots

Leave a Reply Cancel reply

Please allow ads on our site