Simplifying Transformer Blocks

The article by Bobby He & Thomas Hofmann focuses on simplifying transformer blocks in deep learning. They question whether components such as skip connections, projection/value matrices, sequential sub-blocks, and normalization layers can be removed without affecting training speed. Through experiments, they found that simplified transformers matched the training speed and performance of standard transformers, but with 15% faster training throughput and using 15% fewer parameters. They highlight the role of signal propagation in motivating these modifications and its limitations in understanding deep Neural Network training dynamics.

Publication date: 6 Nov 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.01906

Post Views: 331

Press ESC to close

Share Article:

root

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

High Precision Causal Model Evaluation with Conditional Randomization

Please allow ads on our site