This article focuses on the capabilities of transformer networks like ChatGPT and other Large Language Models (LLMs). These networks use an encoding vector to transform a complete input sequence, such as a sentence, which allows them to learn long-range temporal dependencies. The self-attention mechanism enhances the temporal context by computing associations between pairs of words in the input sequence. The authors suggest that cortical waves of neural activity could implement a similar encoding principle, providing temporal context by encapsulating recent input history into a single spatial pattern at each moment in time.
Publication date: 26 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.14267