The authors introduce the Temporal Audio Source Counting Network (TaCNet), a new architecture that addresses issues in audio source counting tasks. TaCNet works directly on raw audio inputs, removing the need for complex preprocessing steps and simplifying the overall process. The network performs particularly well in real-time speaker counting, even when input windows are truncated. The evaluation of TaCNet, conducted using the LibriCount dataset, demonstrates its superior performance, marking it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18% over 11 classes, TaCNet has proven its effectiveness across various scenarios, including applications in Chinese and Persian languages, showcasing its versatility and potential impact.
Publication date: 4 Nov 2023
Project Page: https://arxiv.org/abs/2311.02369
Paper: https://arxiv.org/pdf/2311.02369