SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

The article presents SPHINX-X, an enhancement of the SPHINX framework for multi-modal large language models (MLLMs). SPHINX-X improves architecture and training efficiency by removing redundant visual encoders, simplifying multi-stage training, and bypassing fully-padded sub-images. It utilizes a comprehensive multi-domain and multi-modal dataset that includes publicly available resources in language, vision, and vision-language tasks. By training over various LLMs, the research achieves a range of MLLMs with different parameter sizes and multilingual abilities. The authors found a strong correlation between the multi-modal performance and the data and parameter scales.

Publication date: 8 Feb 2024
Project Page: https://github.com/Alpha-VLLM/LLaMA2-Accessory
Paper: https://arxiv.org/pdf/2402.05935

Post Views: 303

Press ESC to close

Share Article:

root

Large Language Model for Table Processing: A Survey

Time Series Diffusion in the Frequency Domain

Please allow ads on our site