Transformers

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

root October 17, 2023 0

The paper examines in-context learning (ICL) in large language models based on the transformer architecture. It explores how transformers learn in complex scenarios, specifically studying learning with representations. The authors…

Machine Learning

Functional Interpolation for Relative Positions Improves Long Context Transformers

root October 9, 2023 0

This paper discusses the challenge of Transformer models’ performance decay on inputs longer than those used during training. It introduces a novel functional relative position encoding with progressive interpolation, FIRE,…

Computation and Language Machine Learning

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

root October 5, 2023 0

The study delves into the in-context learning capabilities of Transformers and Large Language Models (LLMs). The paper demonstrates that Transformers are capable of implementing gradient-based learning algorithms for various real-valued…

Computation and Language

LONGNET: Scaling Transformers to 1,000,000,000 Tokens

root July 6, 2023 0

LONGNET is designed to address the challenge of scaling sequence length in large language models. Traditional methods struggle with either computational complexity or model expressivity, limiting the maximum sequence length….

Page 1 of 1

Press ESC to close

Transformers

Please allow ads on our site