Transformers and Cortical Waves: Encoders for Pulling In Context Across Time

This article focuses on the capabilities of transformer networks like ChatGPT and other Large Language Models (LLMs). These networks use an encoding vector to transform a complete input sequence, such as a sentence, which allows them to learn long-range temporal dependencies. The self-attention mechanism enhances the temporal context by computing associations between pairs of words in the input sequence. The authors suggest that cortical waves of neural activity could implement a similar encoding principle, providing temporal context by encapsulating recent input history into a single spatial pattern at each moment in time.

Publication date: 26 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.14267

Post Views: 243

Press ESC to close

Share Article:

root

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

Please allow ads on our site