This work is about a self-supervised method, FSD, for recognizing 3D objects from a single RGB-D image. It aims to predict the 3D shape, size, and 6D pose of objects, eliminating the need for CAD models. The method uses a multi-stage training pipeline for efficient transfer of synthetic performance to the real-world domain. It combines 2D and 3D supervised losses during synthetic domain training, and incorporates 2D supervised and 3D self-supervised losses on real-world data in two additional stages. This method outperforms existing self-supervised 6D pose and size estimation baselines on the NOCS test-set, with a 16.4% absolute improvement in mAP for 6D pose estimation, operating near real-time at 5 Hz.

 

Publication date: 20 Oct 2023
Project Page: fsd6d.github.io
Paper: https://arxiv.org/pdf/2310.12974