The article introduces DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (TOD) systems. The toolkit facilitates systematic evaluations and comparisons between TOD systems that use fine-tuning of Pretrained Language Models (PLMs) and those that utilize the zero-shot and in-context learning capabilities of Large Language Models (LLMs). The study reveals that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likable responses. However, LLMs face challenges in adhering to task-specific instructions and generating outputs in multiple languages. The toolkit aims to serve as a valuable resource for researchers developing and evaluating multilingual TOD systems.
Publication date: 5 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.02208