Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

The study delves into the in-context learning capabilities of Transformers and Large Language Models (LLMs). The paper demonstrates that Transformers are capable of implementing gradient-based learning algorithms for various real-valued functions. However, their performance diminishes in more complex tasks. The study also emphasizes the attention-free models’ performance, which is nearly identical to Transformers on a range of tasks. Interestingly, Transformers can learn to implement two distinct algorithms to solve a single task and adaptively select the more sample-efficient algorithm.

Publication date: 4 Oct 2023
Project Page: https://arxiv.org/abs/2310.03016
Paper: https://arxiv.org/pdf/2310.03016

Post Views: 294

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

SemiReward: A General Reward Model for Semi-supervised Learning

Leave a Reply Cancel reply

Please allow ads on our site