This study focuses on Parameter-Efficient Fine-Tuning (PEFT) in speech processing, exploring optimal placement and merging strategies, as well as the use of ensemble learning. The results reveal that Differentiable Architecture Search (DARTS) doesn’t perform better than the baseline approach, which involves inserting the same PEFT method into all layers of a Self-Supervised Learning (SSL) model. Instead, ensemble learning, particularly majority voting, shows superior performance. The study indicates that different PEFT methods learn in varied ways, and this variation might explain why the synergistic integration of various PEFT methods through ensemble learning can harness their unique learning capabilities more effectively compared to individual layer-wise optimization.

 

Publication date: 5 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.02122