November 30, 2023

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

The paper presents a method for leveraging pretrained Vision Language Models (VLMs) to annotate 3D objects, considering their full appearance, phrasing of the question, and other affecting factors. The study shows that this approach can outperform a language model for summarization and improve downstream VLM predictions. The method was tested on the large-scale Objaverse dataset, showing that VLMs can approach the quality of human-verified type and material annotations without additional training.

Publication date: 29 Nov 2023
Project Page: https://arxiv.org/abs/2311.17851v1
Paper: https://arxiv.org/pdf/2311.17851

Post Views: 351

3D Objects, Data annotation, latent physical properties, object semantics, vision language models

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Gaussian Shell Maps for Efficient 3D Human Generation

Towards Real-World Focus Stacking with Deep Learning

Leave a Reply Cancel reply

Please allow ads on our site