Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models

This research focuses on understanding the information encoded in speech processing by using vector representations of speech from a pretrained model. The study proposes an unsupervised method using ABX tests on audio recordings to determine whether the representations computed by a multilingual speech model encode a given characteristic. The method was tested on room acoustics aspects, linguistic genre, and phonetic aspects. The findings suggest that ABX tests can bring out differences in the acoustic setup, voice properties, or linguistic content. The study offers a new approach to detect factors in corpora intended for unsupervised learning and a means to classify recordings where such metadata are unavailable.

Publication date: 9 Feb 2024
Project Page: Unknown
Paper: https://arxiv.org/pdf/2402.05581

Post Views: 250

Press ESC to close

Share Article:

root

AutoAugment Is What You Need: Enhancing Rule-based Augmentation Methods in Low-resource Regimes

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)-Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

Please allow ads on our site