JUST ONE BYTE (PER GRADIENT): A NOTE ON LOWBANDWIDTH DECENTRALIZED LANGUAGE MODEL FINETUNING USING SHARED RANDOMNESS
This paper presents a low-bandwidth, decentralized language model fine-tuning approach that leverages shared randomness. The method is an extension of the memory-efficient Simultaneous Perturbation Stochastic Approximation (SPSA) and uses shared…
Continue reading