DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

The article introduces DISCO, a large-scale human annotated corpus for disfluency correction in four Indo-European languages: English, Hindi, German, and French. Disfluency correction is the process of removing disfluent elements such as fillers, repetitions, and corrections from spoken utterances, making them more readable and interpretable. This corpus aims to aid language understanding tasks and improve Automatic Speech Recognition outputs. The researchers also demonstrate the positive impact of disfluency correction on Machine Translation systems.

Publication date: 25 Oct 2023
Project Page: https://github.com/vineet2104/DISCO
Paper: https://arxiv.org/pdf/2310.16749

Post Views: 353

automatic speech recognition, Disfluency Correction, Human Annotated Corpus, Indo-European languages, machine translation

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Conversational Challenges in AI-Powered Data Science: Obstacles, Needs, and Design Opportunities

AdaptiX — A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics

Leave a Reply Cancel reply

Please allow ads on our site