The paper introduces LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art large language models (LLMs). The dataset, collected from 210K unique IP addresses, offers insights into user interaction with LLMs, including their behaviors, expectations, and trust. It was created to address the lack of diverse, real-user queries in the research community and is expected to be a valuable resource for understanding and advancing LLM capabilities. The dataset is publicly available for use.

 

Publication date: 21 Sep 2023
Project Page: https://huggingface.co/datasets/lmsys/lmsys-chat-1m
Paper: https://arxiv.org/pdf/2309.11998