Scattering Vision Transformer: Spectral Mixing Matters

The paper presents a new approach called Scattering Vision Transformer (SVT) to address challenges in vision tasks. SVT incorporates a spectrally scattering network that captures intricate image details and separates low-frequency and high-frequency components. It also introduces a unique spectral gating network for token and channel mixing, reducing complexity. SVT achieves state-of-the-art performance on the ImageNet dataset with significant reduction in parameters and FLOPS. It also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets.

Publication date: 2 Nov 2023
Project Page: https://badripatro.github.io/svt/
Paper: https://arxiv.org/pdf/2311.01310

Post Views: 313

Press ESC to close

Share Article:

root

Robust Identity Perceptual Watermark Against Deepfake Face Swapping

Joint 3D Shape and Motion Estimation from Rolling Shutter Light-Field Images

Please allow ads on our site