The study introduces CMMU, a benchmark tool for evaluating the understanding and reasoning abilities of multi-modal large language models (MLLMs) in Chinese. It comprises 3,603 questions across 7 subjects, covering knowledge from primary to high school. The questions are of three types: multiple-choice, multiple-response, and fill-in-the-blank. The paper also presents an evaluation strategy named ShiftCheck for assessing multiple-choice questions. The strategy aims to minimize position bias and the influence of randomness on correctness. The authors evaluate seven open-source MLLMs and find that CMMU poses a significant challenge to current MLLMs.

 

Publication date: 26 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.14011