The article focuses on training large-scale video models, which traditionally require extensive resources and time. The authors propose a method to train these models on a single machine with eight GPUs in just one day. They identify and optimize three bottlenecks: IO, CPU, and GPU computation. The result is a highly efficient video training pipeline that achieves higher accuracies with one-eighth of the computation compared to previous methods. The authors also address unique challenges related to video loading and pre-processing.

 

Publication date: 28 Sep 2023
Project Page: https://github.com/zhaoyue-zephyrus/AVION
Paper: https://arxiv.org/pdf/2309.16669