Spatial-Temporal Activity-Informed Diarization and Separation

The article presents a system that uses spatial-temporal activity for multichannel speaker diarization and separation. The architecture combines array signal processing units and deep learning units. A Spatial Activity-driven Speaker Diarization network (SASDnet) is used for speaker diarization, estimating the speaker activity from a spatial coherence matrix. For speaker separation, a Global and Local Activity-driven Speaker Extraction network (GLASEnet) is proposed. The system demonstrates superior speaker diarization, counting, and separation performance with low computational complexity.

Publication date: 31 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.16850

Post Views: 280

Press ESC to close

Share Article:

root

PBSCSR: The Piano Bootleg Score Composer Style Recognition Dataset

Localizing uniformly moving mono-frequent sources using an inverse 2.5D approach

Please allow ads on our site