AlignBench: Benchmarking Chinese Alignment of Large Language Models

The article introduces ALIGN BENCH, a comprehensive benchmark for evaluating alignment in Large Language Models (LLMs) for the Chinese language. The benchmark utilizes a human-in-the-loop data curation pipeline and includes automatic evaluations tailored for alignment. The system, CritiqueLLM, recovers 95% of GPT-4’s evaluation ability and is available to researchers via public APIs. The main focus is on real-world user queries, open-ended answers, and challenging tasks to reflect the authentic usage of LLMs. All evaluation codes, data, and LLM generations are available on the project’s GitHub page.

Publication date: 1 Dec 2023
Project Page: https://github.com/THUDM/AlignBench
Paper: https://arxiv.org/pdf/2311.18743

Post Views: 335

Press ESC to close

Share Article:

root

TaskBench: Benchmarking Large Language Models for Task Automation

Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach

Please allow ads on our site