ArcMMLU, a new benchmark for Large Language Models (LLMs), is introduced in this paper. It’s tailored for the Library & Information Science (LIS) domain and specifically designed for Chinese. The aim is to measure the knowledge and reasoning capabilities of LLMs within four key sub-domains: Archival Science, Data Science, Library Science, and Information Science. The paper reveals that while most mainstream LLMs achieve an average accuracy rate above 50% on ArcMMLU, there is still room for improvement. The study aims to fill a critical gap in LLM evaluations within the Chinese LIS domain and pave the way for future development of LLMs tailored to this specialized area.

 

Publication date: 1 Dec 2023
Project Page: https://github.com/stzhang-patrick/ArcMMLU
Paper: https://arxiv.org/pdf/2311.18658