This research paper explores the sharpness dynamics in neural network training. It specifically focuses on the top eigenvalue of the Hessian of the loss, which is referred to as ‘sharpness’. The study reveals various phenomena throughout the training process, including an early decrease in sharpness, progressive sharpening and an edge of stability. The researchers used a simple 2-layer linear network trained on a single training example to demonstrate these phenomena. The paper also discusses the conditions required for edge of stability and a period-doubling route to chaos when the learning rate is increased. The findings from this study could be generalized to real-world scenarios.

 

Publication date: 6 Nov 2023
Project Page: Not Provided
Paper: https://arxiv.org/pdf/2311.02076