This paper offers a different understanding of the Adam optimizer, a popular choice in deep neural network training. Despite its practical success, the theoretical understanding of Adam’s algorithmic components has been limited. The authors propose that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). They explore the benefits of Adam’s algorithmic components from an online learning perspective. The paper suggests that a good optimizer design can be reduced to the design of a good online learner.

 

Publication date: 2 Feb 2024
Project Page: https://arxiv.org/abs/2402.01567
Paper: https://arxiv.org/pdf/2402.01567