The article presents a study on the performance disparities in multilingual task-oriented dialogue (TOD) systems. The study defines new measures to capture disparities across languages and within individual languages. Factors such as the nature of the TOD task, the underlying pretrained language model, the target language, and the amount of annotated data are shown to influence these disparities. The study finds that TOD systems trained for Arabic or Turkish using data parallel to English still exhibit diminished performance. The article offers insights into these disparities and provides practical tips on TOD data collection and system development for new languages.
Publication date: 20 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.12892