Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

This academic article discusses the potential of using uncertainty measures derived from self-supervised learning (SSL) models like wav2vec for predicting audio quality in voice synthesis and conversion systems. Traditional methods such as Mean Opinion Scores (MOS) are challenging to collect at scale, hence the need for an efficient prediction method. The authors propose that model uncertainty around the contents of an audio sequence can correspond to low audio quality. Their findings reveal that uncertainty measures can serve as effective proxies for audio quality assessment, particularly in low-resource settings. The study is based on data from the 2022 and 2023 VoiceMOS challenges.

Publication date: 29 Dec 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2312.15616

Post Views: 261

Press ESC to close

Share Article:

root

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition

Please allow ads on our site