An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

The paper discusses a new approach towards audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. The approach uses different extraction strategies based on audio quality, aiming to balance between interference removal and speech preservation. The paper reveals that the approach achieves a character error rate of 24.2% and 33.2% on the Dev and Eval set, respectively, earning the second place in the challenge. The study contributes to the field of automatic speech recognition and target speaker extraction by utilizing audio quality as a factor for different extraction strategies.

Publication date: 11 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.03697

Post Views: 285

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

Leave a Reply Cancel reply

Please allow ads on our site