The article discusses the effectiveness of Parameter-Efficient Fine-Tuning (PEFT) in speech processing, with a particular focus on the optimal placement of PEFT methods, their merging strategies, and the use of ensemble techniques. The research, conducted through extensive experiments, revealed that Differentiable Architecture Search (DARTS) doesn’t outperform the baseline approach of inserting the same PEFT method into all layers of a Self-Supervised Learning (SSL) model. However, an ensemble learning approach, especially one employing majority voting, showed superior performance. The study indicates that different PEFT methods learn in varied ways, which might explain why their synergistic integration through ensemble learning can harness their unique learning capabilities more effectively compared to individual layer-wise optimization.

 

Publication date: 5 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.02122