This article discusses the development of DeepSeek LLM, a project dedicated to the advancement of open-source language models. The authors delve into the study of scaling laws and present their distinctive findings that facilitate the scaling of large scale models in two prevalent used open-source configurations, 7B and 67B. The project has developed a dataset consisting of 2 trillion tokens and is continuously expanding. The DeepSeek LLM Base models undergo supervised fine-tuning (SFT) and direct preference optimization (DPO), leading to the creation of DeepSeek Chat models. The evaluation results show that DeepSeek LLM 67B surpasses LLaMA-2 70B across a range of benchmarks, especially in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that the DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

 

Publication date: 5 Jan 2024
Project Page: https://arxiv.org/abs/2401.02954v1
Paper: https://arxiv.org/pdf/2401.02954