The article presents SeaLLMs, a new series of Large Language Models (LLMs) that focus on Southeast Asian languages. These models are designed to address the linguistic bias of existing LLMs, which heavily favor English and other high-resource languages. SeaLLMs are built on the Llama-2 model and are further enhanced with extended vocabulary, specialized instruction, and alignment tuning. This allows the models to better capture the intricacies of regional languages and reflect local cultural norms. SeaLLM-13b models show superior performance across a range of linguistic tasks and outperform comparable open-source models. They also outperform ChatGPT-3.5 in non-Latin languages like Thai, Khmer, Lao, and Burmese.

 

Publication date: 1 Dec 2023
Project Page: https://github.com/DAMO-NLP-SG/SeaLLMs
Paper: https://arxiv.org/pdf/2312.00738