This paper introduces a concept called One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and their affordances. The authors conduct a comprehensive analysis of existing foundation models to explore their understanding of affordances and assess the potential for data-limited affordance learning. A new vision-language framework is proposed that boosts the alignment between visual features and affordance text embeddings. This method outperforms state-of-the-art models with less than 1% of the full training data, showing good generalization capability on unseen objects and affordances.
Publication date: 29 Nov 2023
Project Page: https://arxiv.org/abs/2311.17776v1
Paper: https://arxiv.org/pdf/2311.17776