This academic article discusses the development of a Stochastic Two-Point (S2P) approach for zeroth-order optimization in deep models. Zeroth-order methods are becoming increasingly popular due to their ability to optimize models using only forward passes, which makes them less memory-intensive and resource-demanding than other methods. The new S2P method offers promising theoretical convergence properties and its accelerated variant (AS2P) shows even more efficiency in optimizing objectives for large deep models, including language models. The article provides comprehensive empirical results demonstrating AS2P’s effectiveness and superiority over standard methods in terms of speed and stability during training.

 

Publication date: 5 Feb 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2402.01621