High-dimensional SGD aligns with emerging outlier eigenspaces
The article investigates the joint evolution of training dynamics via Stochastic Gradient Descent (SGD) and the spectra of empirical Hessian and gradient matrices. The authors demonstrate that in two canonical…
Continue reading