SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

The article introduces SPHINX-X, a series of Multi-modality Large Language Models (MLLMs) developed based on SPHINX. This model aims to enhance the architecture and training efficiency by removing unnecessary visual encoders, skipping fully-padded sub-images with skip tokens, and simplifying multi-stage training into a single-stage all-in-one paradigm. The paper presents a comprehensive multi-domain and multi-modal dataset, which includes publicly available resources for language, vision, and vision-language tasks. The SPHINX-X models demonstrate a strong correlation between multi-modal performance and the scale of data and parameters.

Publication date: 8 Feb 2024
Project Page: https://github.com/Alpha-VLLM/LLaMA2-Accessory
Paper: https://arxiv.org/pdf/2402.05935

Post Views: 354

Press ESC to close

Share Article:

root

Scalable Diffusion Models with State Space Backbone

ClickSAM: Fine-tuning Segment Anything Model using click prompts for ultrasound image segmentation

Please allow ads on our site