Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
The paper introduces a new method for self-supervised video object segmentation (VOS) by leveraging inherent structural dependencies in DINO-pretrained Transformers. Instead of resorting to auxiliary modalities or iterative slot attention,…
Continue reading