The paper introduces a new kind of adversarial attack on machine learning models, specifically neural networks. It focuses on the vulnerability of these models to adversarial attacks. The proposed attack is based on the truncated power iteration that provides sparsity to singular vectors of the hidden layers of Jacobian matrices. The study uses the ImageNet benchmark validation subset to analyze the proposed method. The results show that this method can fool more than 50% of the models while only damaging 5% of pixels. Moreover, the perturbations are highly transferable among different models. The paper emphasizes the importance of developing robust machine learning systems.

 

Publication date: 25 Jan 2024
Project Page: arXiv:2401.14031v1
Paper: https://arxiv.org/pdf/2401.14031