CrisisViT is a robust Vision Transformer for crisis image classification. The paper argues that in emergency situations, images from social media can provide valuable, real-time information. However, the volume of images posted requires an efficient way to analyze and categorize them. The authors propose CrisisViT, a deep learning model that uses transformer-based architectures for this task. The model was trained and tested on the new Incidents1M crisis image dataset. The results show that CrisisViT outperforms previous models in classifying emergency type, image relevance, humanitarian category, and damage severity. The use of the Incidents1M dataset also improved the model’s accuracy.

 

Publication date: 5 Jan 2024
Project Page: https://arxiv.org/abs/2401.02838v1
Paper: https://arxiv.org/pdf/2401.02838