This academic paper offers an extensive review of evaluation methods for task-oriented dialogue systems, particularly focusing on their practical applications such as in customer service. It provides a comprehensive overview of the constructs and metrics used in prior studies, discusses the challenges in the context of dialogue system evaluation, and develops a research agenda for future evaluation of dialogue systems. The review, which analyzed 122 studies from four databases, found a significant variety in both constructs and methods used for evaluation. The authors express hope that future work will adopt a more critical approach towards the operationalisation and specification of the constructs used.

 

Publication date: 21 Dec 2023
Project Page: https://arxiv.org/abs/2312.13871
Paper: https://arxiv.org/pdf/2312.13871