The paper presents a new approach called Scattering Vision Transformer (SVT) to address challenges in vision tasks. SVT incorporates a spectrally scattering network that captures intricate image details and separates low-frequency and high-frequency components. It also introduces a unique spectral gating network for token and channel mixing, reducing complexity. SVT achieves state-of-the-art performance on the ImageNet dataset with significant reduction in parameters and FLOPS. It also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets.
Publication date: 2 Nov 2023
Project Page: https://badripatro.github.io/svt/
Paper: https://arxiv.org/pdf/2311.01310