MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

In the pursuit of mastering intricate multimodal tasks, the paper introduces MM-Vet, an evaluation benchmark tailored for Large Multimodal Models (LMMs). LMMs, which enhance large language models (LLMs) with multimodal inputs, have exhibited an impressive ability to tackle various challenging tasks, from understanding visual jokes to reasoning about current events. MM-Vet emphasizes the evaluation of these models based on their capability to merge core vision-language functionalities. The benchmark defines six foundational VL capabilities (recognition, OCR, knowledge, language generation, spatial awareness, and math) and assesses 16 integrations derived from these capabilities. This new method of evaluation aims to provide a systematic approach for quantifying the effectiveness and competence of LMMs in diverse scenarios.

Publication date: Aug 7, 2023
Project Page: https://github.com/yuweihao/MM-Vet
Paper: https://arxiv.org/pdf/2308.02490.pdf

Post Views: 458

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Revisiting Deformable Convolution for Depth Completion

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Leave a Reply Cancel reply

Please allow ads on our site