The article presents a novel method for Target-Speaker Voice Activity Detection (TS-V AD), called Profile-Error-Tolerant TS-V AD (PET-TSV AD). This method improves upon existing TS-V AD techniques by being robust to errors in speaker profiles. These errors often occur when traditional clustering-based diarization methods are used. The PET-TSV AD method uses transformer-based TS-V AD that can handle a variable number of speakers and introduces additional pseudo-speaker profiles to manage undetected speakers. Experimental results show that PET-TSV AD consistently outperforms existing TS-V AD methods.

 

Publication date: 25 Sep 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2309.12521