The study by Kentaro Mitsui, Yukiya Hono, and Kei Sawada presents CHATS – a system for generating spoken dialogues from written ones between AI agents. Large Language Models (LLMs) have enabled natural written dialogues. However, transforming these into human-like spoken conversations poses challenges, given the unique characteristics of spoken dialogues like backchannels, laughter, and smooth turn-taking. CHATS addresses these challenges by generating speech for both listener and speaker simultaneously, determining silence duration, and initiating overlapping speech generation. Experimental evaluations suggest CHATS outperforms the text-to-speech baseline, producing more interactive, fluid dialogues while maintaining clarity and intelligibility.
Publication date: 2 Oct 2023
Project Page: https://arxiv.org/abs/2310.01088v1
Paper: https://arxiv.org/pdf/2310.01088