The research team from Fudan University, The University of Hong Kong, and University of Illinois Urbana-Champaign propose an evaluation benchmark, L-Eval, for Long Context Language Models (LCLMs). They aimed to address the gap in standardized evaluation and to understand the practical efficacy of these models. L-Eval is developed with 411 long documents and over 2,000 query-response pairs from various domains. Preliminary findings indicate the promise of context length expansion in language models.

 

Publication date: Jul 20, 2023
Project Page: https://github.com/OpenLMLab/LEval
Paper: https://arxiv.org/pdf/2307.11088.pdf