Evaluating Large Language Models for Generalization and Robustness via Data Compression
The article addresses the challenges in evaluating large language models, including data contamination, sensitivity to prompts, and the high cost of benchmark creation. The authors propose a new approach that…
Continue reading