DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

The article introduces DISCO, a large-scale human annotated corpus for disfluency correction in four Indo-European languages: English, Hindi, German, and French. Disfluency correction is the process of removing disfluent elements such as fillers, repetitions, and corrections from spoken utterances, making them more readable and interpretable. This corpus aims to aid language understanding tasks and improve Automatic Speech Recognition outputs. The researchers also demonstrate the positive impact of disfluency correction on Machine Translation systems.

Publication date: 25 Oct 2023
Project Page: https://github.com/vineet2104/DISCO
Paper: https://arxiv.org/pdf/2310.16749

Post Views: 354

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities

AdaptiX — A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics

Please allow ads on our site