Automated Model Evaluation

Artificial Intelligence Computation and Language

Evaluating Large Language Models for Generalization and Robustness via Data Compression

root February 3, 2024 0

The paper discusses the challenges in evaluating large language models, including data contamination, prompt sensitivity, and benchmark creation cost. To address these, the authors propose a lossless data compression-based evaluation…

Artificial Intelligence Computation and Language

Energy-based Automated Model Evaluation

root January 24, 2024 0

The study tackles the issues in Automated Model Evaluation (AutoEval), such as overconfidence and high computational cost. It introduces a novel measure, Meta-Distribution Energy (MDE), to enhance the AutoEval framework’s…

Page 1 of 1

Press ESC to close

Automated Model Evaluation

Please allow ads on our site