Phonetically rich corpus construction for a low-resourced language

This paper proposes a method to create a phonetically rich corpus for low-resource languages, with a focus on Brazilian Portuguese. The researchers developed a sentence selection algorithm based on triphone distribution and a new phonemic classification that reflects acoustic-articulatory speech features. The methodology was applied to Brazilian Portuguese, a language with limited resources despite its broad user base. The authors’ approach achieved a 55.8% higher percentage of distinct triphones compared to other available phonetic-rich corpuses, improving the representation of language-specific speech features.

Publication date: 8 Feb 2024
Project Page: https://arxiv.org/abs/2402.05794v1
Paper: https://arxiv.org/pdf/2402.05794

Post Views: 319

Press ESC to close

Share Article:

root

FAQ-Gen: An automated system to generate domain-specific FAQs to aid content comprehension

Text-to-Code Generation with Modality-relative Pre-training

Please allow ads on our site