This position paper discusses the role of human evaluation in Natural Language Generation (NLG) systems, particularly focusing on the generation of humour, irony, and sarcasm. The authors argue that the characteristics of evaluator panels are crucial in this subdomain, and there should be transparency and replicability in reporting demographic characteristics. The paper provides an overview of each language form and analyzes how their interpretation varies with different participant variables. It also critically surveys recent works in NLG to assess the reporting of evaluation procedures and highlights a lack of open reporting of evaluator demographic information.
Publication date: 10 Nov 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.05552