CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

The study focuses on the development of Critique LLM, a new critique generation model. It addresses the need for comprehensive evaluation of Large Language Models (LLMs) like GPT-4 which are commonly used to generate text. Traditional evaluation metrics have limited effectiveness. Therefore, the authors propose Critique LLM, which includes a dialogue-based prompting method for high-quality referenced/reference-free evaluation data. The experimental results show that Critique LLM can even outperform GPT-4 in certain tasks. The authors also demonstrate that the critiques generated can serve as scalable feedback to improve the generation quality of LLMs.

Publication date: 1 Dec 2023
Project Page: https://github.com/thu-coai/CritiqueLLM
Paper: https://arxiv.org/pdf/2311.18702

Post Views: 298

CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity

Leave a Reply Cancel reply

Please allow ads on our site