This study provides an in-depth review of the evolution of large language model (LLM) training techniques and inference deployment technologies. The paper discusses various aspects of training, including data preprocessing, architecture, pre-training tasks, parallel training, and model fine-tuning. It also covers inference topics such as model compression, parallel computation, memory scheduling, and structural optimization. The study highlights the increasing focus on cost-effective training and deployment of LLMs, indicating this as a future development trend.

 

Publication date: 4 Jan 2022
Project Page: https://arxiv.org/abs/2401.02038v1
Paper: https://arxiv.org/pdf/2401.02038