vision language models

SonicVisionLM: Playing Sound with Vision Language Models

root January 11, 2024 0

The SonicVisionLM, a novel framework, is designed to generate sound effects for silent videos by leveraging vision language models (VLMs). Instead of creating sound from visual representations, which can be…

Computer Vision and Pattern Recognition

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

root November 30, 2023 0

The paper presents a method for leveraging pretrained Vision Language Models (VLMs) to annotate 3D objects, considering their full appearance, phrasing of the question, and other affecting factors. The study…

Page 1 of 1

Press ESC to close

vision language models

Please allow ads on our site