This academic article presents a unique approach for personalizing keyword spotting using speaker information. The method employs Feature-wise Linear Modulation (FiLM) to learn from multiple information sources. The authors experiment with both Text-Dependent and Text-Independent speaker recognition systems to extract speaker data. They find their approach significantly improves keyword detection accuracy, especially among underrepresented speaker groups. The method requires only a minor increase in the number of parameters, making it a practical solution for real-world applications.

 

Publication date: 6 Nov 2032
Project Page: https://arxiv.org/abs/2311.03419
Paper: https://arxiv.org/pdf/2311.03419