Vision Transformers Need Registers
This research paper discusses the identification of artifacts in the feature maps of both supervised and self-supervised Vision Transformer (ViT) networks. These artifacts are high-norm tokens appearing primarily in low-informative…
Continue reading