The study delves into the in-context learning capabilities of Transformers and Large Language Models (LLMs). The paper demonstrates that Transformers are capable of implementing gradient-based learning algorithms for various real-valued functions. However, their performance diminishes in more complex tasks. The study also emphasizes the attention-free models’ performance, which is nearly identical to Transformers on a range of tasks. Interestingly, Transformers can learn to implement two distinct algorithms to solve a single task and adaptively select the more sample-efficient algorithm.
Publication date: 4 Oct 2023
Project Page: https://arxiv.org/abs/2310.03016
Paper: https://arxiv.org/pdf/2310.03016