This academic article by Jiyi Li explores whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task. The author investigates existing crowdsourcing datasets that can be used for a comparative study, and creates a benchmark. A comparison is made between the quality of individual crowd labels and LLM labels. In addition, a Crowd-LLM hybrid label aggregation method is proposed and its performance is verified. The study finds that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels of the datasets, which is also higher than the quality of LLM labels themselves.
Publication date: 18 Jan 2024
Project Page: https://arxiv.org/abs/2401.09760v1
Paper: https://arxiv.org/pdf/2401.09760