EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

The article introduces EmphAssess, a prosodic benchmark for evaluating speech-to-speech models’ ability to encode and reproduce prosodic emphasis. It’s applied to two tasks: speech resynthesis and speech-to-speech translation. The benchmark assesses the model’s ability to encode emphasis in the input speech and reproduce it in the output, potentially across a change of speaker and language. The evaluation pipeline also includes EmphaClass, a new model that classifies emphasis at the frame or word level. Emphasis on prosody is crucial as it modifies the meaning of the conveyed message and adds naturalness to an utterance.

Publication date: 22 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.14069

Post Views: 251

Press ESC to close

Share Article:

root

Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns

T-Eval: Evaluating the Tool Utilization Capability Step by Step

Please allow ads on our site