The paper discusses the use of deepfake audio as a data augmentation technique to train robust speech-to-text transcription models. The authors argue that finding a diverse and large labeled dataset is challenging, especially for languages less popular than English. They propose a framework that uses deepfake audio for data augmentation. This approach was validated through experiments using existing deepfake and transcription models. The paper concludes that this technique can help in creating transcription models that can handle variations in languages, such as accents.
Publication date: 2022-03-01
Project Page: https://www.researchgate.net/publication/352837613_Deepfake_audio_as_a_data_augmentation_technique_for_training_automatic_speech_to_text_transcription_models
Paper: https://arxiv.org/pdf/2309.12802