CLIP-ViT-L-336px Papers

Artificial Intelligence Computation and Language

Improved Baselines with Visual Instruction Tuning

root October 8, 2023 0

The study focuses on improving large multimodal models (LMMs), specifically the LLaVA model, through visual instruction tuning. By using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data, the…

Press ESC to close

CLIP-ViT-L-336px

Please allow ads on our site