Efficient Stagewise Pretraining via Progressive Subnetworks
The paper discusses the limitations of current stagewise pretraining methods for large language models and proposes a new framework, progressive subnetwork training. The focus is on a simple instantiation of…
Continue reading