The article presents a survey of unsupervised object localization methods that discover objects in images without any manual annotation. It highlights the importance of such methods in open-world vision systems, such as autonomous robots and cars, augmented reality headsets, and visual search engines. The article also discusses recent advancements in unsupervised localization tasks, attributing the success to Vision Transformer (ViT) models and self-supervised representation learning. These techniques can extract local and global semantically meaningful features from the weak signal provided by pretext tasks, thus eliminating the need for carefully designed hand-crafted methods, generative adversarial models, or interpretation of thousands of object prototypes.
Publication date: 19 Oct 2023
Project Page: https://github.com/valeoai/Awesome-Unsupervised-Object-Localization
Paper: https://arxiv.org/pdf/2310.12904