Why Do We Need Weight Decay in Modern Deep Learning?
Despite the widespread use of weight decay in training deep networks, its role is not well understood. This study demonstrates how weight decay modifies the optimization dynamics in overparameterized deep…
Continue reading