The study presents SAIC, a novel pipeline that effectively integrates speech anonymization and identity classification. It has shown remarkable performance, particularly in speaker identity classification tasks, achieving a top-1 accuracy of 96.1% on the Voxceleb1 dataset. SAIC can accurately extract content and identity embeddings, and remove identity information from the original audio. Furthermore, it can merge the content of one audio with the voiceprint of another speaker, generating synthesized speech that maintains content integrity with an altered identity. Although not specifically trained on clinical data, its results suggest its potential application in the healthcare sector.
Publication date: 23 Dec 2023
Project Page: https://arxiv.org/abs/2312.15190v1
Paper: https://arxiv.org/pdf/2312.15190