2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

This research presents a Multimodal Interlaced Transformer (MIT) for weakly supervised point cloud segmentation that jointly considers 2D and 3D data. Current methods require extra 2D annotations to achieve 2D-3D information fusion, which increases the annotation cost. To address this, the study proposes a transformer model with two encoders and one decoder that uses only scene-level class tags. The encoders compute the self-attended features for 3D point clouds and 2D multi-view images, while the decoder implements interlaced 2D-3D cross-attention and 2D-3D feature fusion. Experiments show that the MIT performs favorably against existing weakly supervised point cloud segmentation methods.

Publication date: 20 Oct 2023
Project Page: https://jimmy15923.github.io/mit_web/
Paper: https://arxiv.org/pdf/2310.12817

Post Views: 314

2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Neural Degradation Representation Learning for All-In-One Image Restoration

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Leave a Reply Cancel reply

Please allow ads on our site