DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

This article discusses the development of DeepSeek LLM, a project dedicated to the advancement of open-source language models. The authors delve into the study of scaling laws and present their distinctive findings that facilitate the scaling of large scale models in two prevalent used open-source configurations, 7B and 67B. The project has developed a dataset consisting of 2 trillion tokens and is continuously expanding. The DeepSeek LLM Base models undergo supervised fine-tuning (SFT) and direct preference optimization (DPO), leading to the creation of DeepSeek Chat models. The evaluation results show that DeepSeek LLM 67B surpasses LLaMA-2 70B across a range of benchmarks, especially in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that the DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

Publication date: 5 Jan 2024
Project Page: https://arxiv.org/abs/2401.02954v1
Paper: https://arxiv.org/pdf/2401.02954

Post Views: 398

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)

Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving

Leave a Reply Cancel reply

Please allow ads on our site