The article introduces Cutie, a video object segmentation (VOS) network that employs object-level memory reading for better results. Unlike previous VOS systems that use bottom-up pixel-level memory reading, Cutie uses a top-down approach with object queries that act as a high-level summary of the target object. This method allows for better separation of the foreground object from the background. The system has been tested on the challenging MOSE dataset and has shown significant improvements over other methods.
Publication date: 19 Oct 2023
Project Page: hkchengrex.github.io/Cutie
Paper: https://arxiv.org/pdf/2310.12982