This study investigates adversarial attacks through the lens of diffusion models, but not for enhancing the adversarial robustness of image classifiers. The focus is on utilizing the diffusion model to detect and analyze the anomalies introduced by these attacks on images. The alignment of the distributions of adversarial examples is examined when subjected to the process of transformation using diffusion models. The approach’s efficacy is assessed across CIFAR-10 and ImageNet datasets, showing the capacity to differentiate between benign and attacked images effectively.

 

Publication date: 12 Jan 2024
Project Page: https://arxiv.org/abs/2401.06637
Paper: https://arxiv.org/pdf/2401.06637