November 30, 2023

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

The paper presents a method for leveraging pretrained Vision Language Models (VLMs) to annotate 3D objects, considering their full appearance, phrasing of the question, and other affecting factors. The study shows that this approach can outperform a language model for summarization and improve downstream VLM predictions. The method was tested on the large-scale Objaverse dataset, showing that VLMs can approach the quality of human-verified type and material annotations without additional training.

Publication date: 29 Nov 2023
Project Page: https://arxiv.org/abs/2311.17851v1
Paper: https://arxiv.org/pdf/2311.17851

Post Views: 352

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Gaussian Shell Maps for Efficient 3D Human Generation

Towards Real-World Focus Stacking with Deep Learning

Please allow ads on our site