Explainable Adversarial Attacks on Coarse-to-Fine Classifiers

📅 2025-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of explainability-aware adversarial attacks against multi-stage coarse-to-fine classification models. We propose the first instance-level, semantic-aware joint optimization attack method. Our approach employs gradient-guided pixel-wise importance weighting to precisely perturb discriminative regions, thereby inducing stage-wise targeted misclassification. Crucially, we integrate Layer-wise Relevance Propagation (LRP) as an attribution mechanism directly into the attack objective function—enabling simultaneous optimization of attack efficacy and model decision interpretability. Evaluated on multiple coarse-to-fine benchmark datasets, our method achieves a targeted attack success rate exceeding 89%. LRP relevance maps confirm that perturbations are highly concentrated on semantically critical regions, substantially enhancing human interpretability of the attack behavior. This work establishes a novel paradigm for trustworthy AI security evaluation by unifying adversarial robustness assessment with post-hoc explainability requirements.

Technology Category

Application Category

📝 Abstract
Traditional adversarial attacks typically aim to alter the predicted labels of input images by generating perturbations that are imperceptible to the human eye. However, these approaches often lack explainability. Moreover, most existing work on adversarial attacks focuses on single-stage classifiers, but multi-stage classifiers are largely unexplored. In this paper, we introduce instance-based adversarial attacks for multi-stage classifiers, leveraging Layer-wise Relevance Propagation (LRP), which assigns relevance scores to pixels based on their influence on classification outcomes. Our approach generates explainable adversarial perturbations by utilizing LRP to identify and target key features critical for both coarse and fine-grained classifications. Unlike conventional attacks, our method not only induces misclassification but also enhances the interpretability of the model's behavior across classification stages, as demonstrated by experimental results.
Problem

Research questions and friction points this paper is trying to address.

Interpretable Adversarial Attacks
Complex Multi-level Classification
Machine Learning Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable Adversarial Attacks
Layer-wise Relevance Propagation (LRP)
Hierarchical Classification
🔎 Similar Papers
No similar papers found.
A
Akram Heidarizadeh
Dept. of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, USA
C
Connor Hatfield
Dept. of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, USA
L
Lorenzo Lazzarotto
School of Technology, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil
HanQin Cai
HanQin Cai
Paul N. Somerville Endowed Assistant Professor, University of Central Florida
Data ScienceMachine LearningMathematical Optimization
George Atia
George Atia
Professor, University of Central Florida
Machine LearningExplainable AIRobust Learning and InferenceStatistical Signal Processing