The article investigates the joint evolution of training dynamics via Stochastic Gradient Descent (SGD) and the spectra of empirical Hessian and gradient matrices. The authors demonstrate that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices. This alignment occurs per layer in multi-layer settings, with the final layer’s outlier eigenspace evolving over the course of training. The study contributes to understanding the spectra of Hessian and information matrices over the course of training in overparametrized networks.
Publication date: 4 Oct 2023
Project Page: https://arxiv.org/abs/2310.03010
Paper: https://arxiv.org/pdf/2310.03010