The article introduces DODUO, a method to learn general dense visual correspondence from in-the-wild images and videos without ground truth supervision. Dense visual correspondence plays a critical role in robotic perception. DODUO estimates the dense flow field, encoding the displacement of each pixel in one image to its corresponding pixel in the other image. The method uses flow-based warping to acquire supervisory signals for the training. Semantic priors are incorporated with self-supervised flow training to produce accurate dense correspondence robust to dynamic changes of the scenes. DODUO demonstrates superior performance on point-level correspondence estimation over existing self-supervised correspondence learning baselines and has practical applications in robotics.
Publication date: 27 Sep 2023
Project Page: https://ut-austin-rpl.github.io/Doduo/
Paper: https://arxiv.org/pdf/2309.15110