Act3D is a novel robotic manipulation policy that leverages 3D perceptual representations for high-precision end-effector pose prediction. The system is designed to overcome the computational limitations of high-resolution 3D perceptual grids, which are typically required for accurate pose prediction but are expensive to process. Act3D achieves this by casting 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes 3D feature clouds as input, samples 3D point grids in a coarse-to-fine manner, and selects the best feature point for end-effector pose prediction. The system has demonstrated significant improvements over previous state-of-the-art models in RLbench, a recognized manipulation benchmark.
Publication date: June 30, 2023
Project Page: https://act3d.github.io/
Paper: https://arxiv.org/pdf/2306.17817.pdf