The paper examines in-context learning (ICL) in large language models based on the transformer architecture. It explores how transformers learn in complex scenarios, specifically studying learning with representations. The authors construct synthetic in-context learning problems where the label depends on the input through a potentially complex but fixed representation function. The study reveals that trained transformers consistently achieve near-optimal ICL performance in this setting, exhibiting a dissection where lower layers transform the dataset and upper layers perform linear ICL. The paper also uncovers several mechanisms within the trained transformers, which align well with the theory and provide insights into how transformers perform ICL in more realistic scenarios.
Publication date: 17 Oct 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2310.10616