Evaluating Large Language Models for Generalization and Robustness via Data Compression
The paper discusses the challenges in evaluating large language models, including data contamination, prompt sensitivity, and benchmark creation cost. To address these, the authors propose a lossless data compression-based evaluation…
Continue reading