How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

The paper examines in-context learning (ICL) in large language models based on the transformer architecture. It explores how transformers learn in complex scenarios, specifically studying learning with representations. The authors construct synthetic in-context learning problems where the label depends on the input through a potentially complex but fixed representation function. The study reveals that trained transformers consistently achieve near-optimal ICL performance in this setting, exhibiting a dissection where lower layers transform the dataset and upper layers perform linear ICL. The paper also uncovers several mechanisms within the trained transformers, which align well with the theory and provide insights into how transformers perform ICL in more realistic scenarios.

Publication date: 17 Oct 2023
Project Page: ?
Paper: https://arxiv.org/pdf/2310.10616

Post Views: 320

Press ESC to close

Share Article:

root

Certainty In, Certainty Out: REVQCs for Quantum Machine Learning

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

Please allow ads on our site