The study focuses on improving Speech Emotion Recognition (SER) which is crucial in human-machine interaction. Existing SER methods overlook the information gap between the pre-training speech recognition task and the downstream SER task, which results in sub-par performance. The study proposes an active learning based Fine-Tuning framework for SER, which uses task adaptation pre-training (TAPT) and active learning methods to enhance performance and efficiency. The new method reduces time consumption and improves accuracy.

 

Publication date: 4 Oct 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2310.00283