Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
The authors of the article investigate ‘Eureka-moments’ in transformers during multi-step tasks, where transformers quickly improve after training and validation loss have stagnated for a long time. They found that…
Continue reading