LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

This article presents the development of LT-ViT, a Vision Transformer (ViT) for multi-label Chest X-ray (CXR) classification. Unlike previous ViTs, LT-ViT aggregates information from multiple scales, improving vision-only training for CXRs. The transformer utilizes attention between image tokens and auxiliary tokens representing labels. The study found that LT-ViT outperforms existing pure ViTs on two CXR datasets, is generalizable to other pre-training methods, and enables model interpretability without grad-cam and its variants.

Publication date: 14 Nov 2023
Project Page: Not provided
Paper: https://arxiv.org/pdf/2311.07263

Post Views: 308

root

Exit mobile version

Please allow ads on our site

Looks like you're using an ad blocker. Please support us by disabling these ad blocker.

Press ESC to close

Share Article:

root

Multi Sentence Description of Complex Manipulation Action Videos

Sketch-based Video Object Segmentation: Benchmark and Analysis

Please allow ads on our site