ArcMMLU: A Library and Information Science Benchmark for Large Language Models

ArcMMLU, a new benchmark for Large Language Models (LLMs), is introduced in this paper. It’s tailored for the Library & Information Science (LIS) domain and specifically designed for Chinese. The aim is to measure the knowledge and reasoning capabilities of LLMs within four key sub-domains: Archival Science, Data Science, Library Science, and Information Science. The paper reveals that while most mainstream LLMs achieve an average accuracy rate above 50% on ArcMMLU, there is still room for improvement. The study aims to fill a critical gap in LLM evaluations within the Chinese LIS domain and pave the way for future development of LLMs tailored to this specialized area.

Publication date: 1 Dec 2023
Project Page: https://github.com/stzhang-patrick/ArcMMLU
Paper: https://arxiv.org/pdf/2311.18658

Post Views: 343

ArcMMLU: A Library and Information Science Benchmark for Large Language Models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

ArthModel: Enhance Arithmetic Skills to Large Language Model

Leave a Reply Cancel reply

Please allow ads on our site