The paper delves into the theoretical understanding of fine-tuning methods such as prompting and prefix-tuning of transformer models. It posits that these methods can universally approximate sequence-to-sequence functions, and that even smaller pretrained models can serve as universal approximators when prefixed. The paper also highlights the unique suitability of the attention mechanism for universal approximation, with a single attention head being sufficient to approximate any continuous function. The paper provides insights into the length of the prefix needed to approximate a function to a desired precision.

 

Publication date: 23 Feb 2024
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2402.14753