Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

The paper “Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing” investigates the unique property of Vision Transformers (ViTs) known as “patch selectivity”. This feature enables ViTs to effectively ignore out-of-context image information, which is hypothesized to allow them to handle occlusion more efficiently. The authors propose a method called “Patch Mixing” to imbue Convolutional Neural Networks (CNNs) with this ability. Patch Mixing involves inserting patches from one image onto another during training and interpolating labels between the two image classes.

The study finds that while ViTs naturally possess patch selectivity, CNNs can acquire this ability through the use of Patch Mixing. This leads to improved performance on occlusion benchmarks, suggesting that this training method effectively simulates in CNNs the abilities that ViTs already possess. The authors also introduce two new datasets for evaluating the performance of image classifiers under occlusion, further contributing to the field.

Publication date: June 30, 2023
Project Page: https://arielnlee.github.io/PatchMixing/
Paper: https://arxiv.org/pdf/2306.17848.pdf

Post Views: 365

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

root

Leave a Reply Cancel reply

Press ESC to close

Share Article:

root

Filtered-Guided Diffusion: Fast Filter Guidance for Black-Box Diffusion Models

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Leave a Reply Cancel reply

Please allow ads on our site