This study presents the Mason-Alberta Phonetic Segmenter (MAPS), a new system for forced alignment, a process used in phonetics to automatically determine boundaries between speech segments. The system uses deep neural networks and introduces two improvements: treating the acoustic model as a tagging task instead of a classification task, and an interpolation technique for more precise boundaries. The system was tested against the Montreal Forced Aligner, and while the tagging approach did not yield better results, the interpolation technique resulted in a significant increase in boundary precision.
Publication date: 25 Oct 2023
Project Page: https://arxiv.org/abs/2310.15425
Paper: https://arxiv.org/pdf/2310.15425