The article discusses the need for objective metrics in evaluating speech generation. The authors propose new reference-aware evaluation methods, SpeechBERTScore, SpeechBLEU, and SpeechTokenDistance, which are inspired by evaluation metrics in natural language processing (NLP). The proposed SpeechBERTScore calculates the BERTScore for self-supervised dense features of the generated and reference speech. The evaluations show that these methods correlate better with human subjective ratings and are effective in noisy speech evaluation and have cross-lingual applicability.

 

Publication date: 31 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.16812