This research investigates how different observation spaces affect robot learning. The authors focus on three main modalities: RGB, RGB-D, and point cloud. Through extensive testing on various tasks, they found that point cloud-based methods often outperform RGB and RGB-D methods. This is consistent whether the models are trained from scratch or use pretraining. The study suggests that 3D point cloud is a valuable observation modality for complex robotic tasks. The authors intend to open-source their codes and checkpoints.

 

Publication date: 6 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.02500