The All-Seeing Project is a large-scale data and model developed for recognizing and understanding everything in the open world. A new dataset (AS-1B) is created, incorporating over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. This dataset covers a wide range of 3.5 million common and rare concepts in the real world and comprises 132.2 billion tokens that describe the concepts and their attributes. The All-Seeing model (ASM), a unified framework for panoptic visual recognition and understanding, is also developed. This model, trained with open-ended language prompts and locations, can generalize to various vision and language tasks with remarkable zero-shot performance. The project aims to provide a foundation for vision-language artificial general intelligence research.
Publication date: 3 August 2023
Project Page: https://github.com/OpenGVLab/all-seeing
Paper: https://arxiv.org/pdf/2308.01907.pdf