The article introduces EmphAssess, a prosodic benchmark for evaluating speech-to-speech models’ ability to encode and reproduce prosodic emphasis. It’s applied to two tasks: speech resynthesis and speech-to-speech translation. The benchmark assesses the model’s ability to encode emphasis in the input speech and reproduce it in the output, potentially across a change of speaker and language. The evaluation pipeline also includes EmphaClass, a new model that classifies emphasis at the frame or word level. Emphasis on prosody is crucial as it modifies the meaning of the conveyed message and adds naturalness to an utterance.
Publication date: 22 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.14069