Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

The paper discusses the use of deepfake audio as a data augmentation technique to train robust speech-to-text transcription models. The authors argue that finding a diverse and large labeled dataset is challenging, especially for languages less popular than English. They propose a framework that uses deepfake audio for data augmentation. This approach was validated through experiments using existing deepfake and transcription models. The paper concludes that this technique can help in creating transcription models that can handle variations in languages, such as accents.

Publication date: 2022-03-01
Project Page: https://www.researchgate.net/publication/352837613_Deepfake_audio_as_a_data_augmentation_technique_for_training_automatic_speech_to_text_transcription_models
Paper: https://arxiv.org/pdf/2309.12802

Post Views: 329

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Sequential Action-Induced Invariant Representation for Reinforcement Learning

CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers

Leave a Reply Cancel reply

Please allow ads on our site