The research focuses on Prompt Tuning, a parameter-efficient fine-tuning method for pre-trained large language models. The authors investigate the concept of ‘skill neurons’ – neurons that are highly predictive for a given task. The study shows that prompts tuned for a specific task are transferable to similar tasks but exhibit low robustness to adversarial data. They find that T5 models have higher robustness than RoBERTa models. The research also confirms the existence of skill neurons in both RoBERTa and T5 models. Importantly, the skill neurons in T5 are predictive on adversarial data, unlike in RoBERTa, suggesting that adversarial robustness may be linked to a model’s ability to activate relevant skill neurons on adversarial data.
Publication date: 21 Sep 2023
Project Page: https://arxiv.org/pdf/2309.12263v1.pdf
Paper: https://arxiv.org/pdf/2309.12263