This paper presents a new approach for segmenting and tracking moving objects in complex visual scenes. The method is based on refining optical flow predictions using appearance-based information. It employs a simple selection mechanism to identify accurate flow-predicted masks and an object-centric architecture to refine problematic masks. The model is pre-trained on synthetic data and then adapted to real-world videos in a self-supervised manner. The performance of the model is evaluated on multiple video segmentation benchmarks, where it outperforms existing models on multi-object segmentation tasks.

 

Publication date: 19 Dec 2023
Project Page: https://arxiv.org/pdf/2105.05501.pdf
Paper: https://arxiv.org/pdf/2312.11463