The study focuses on the development of Critique LLM, a new critique generation model. It addresses the need for comprehensive evaluation of Large Language Models (LLMs) like GPT-4 which are commonly used to generate text. Traditional evaluation metrics have limited effectiveness. Therefore, the authors propose Critique LLM, which includes a dialogue-based prompting method for high-quality referenced/reference-free evaluation data. The experimental results show that Critique LLM can even outperform GPT-4 in certain tasks. The authors also demonstrate that the critiques generated can serve as scalable feedback to improve the generation quality of LLMs.
Publication date: 1 Dec 2023
Project Page: https://github.com/thu-coai/CritiqueLLM
Paper: https://arxiv.org/pdf/2311.18702