Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey

📅 2023-06-06

🏛️ Information Fusion

📈 Citations: 74

✨ Influential: 7

🤖 AI Summary

Existing eXplainable Artificial Intelligence (XAI) methods exhibit vulnerability to adversarial attacks in high-stakes applications, compromising explanation fidelity, system trustworthiness, and operational safety. Method: We conduct a systematic literature review of over 120 papers and propose the first unified taxonomic framework for XAI adversarial attacks and defenses, elucidating the coupled fragility between explanations and underlying models. We introduce a multidimensional robustness evaluation metric suite and comprehensively categorize seven canonical attack types—including gradient-based, mask-based, and surrogate-model attacks—and five defense paradigms: explanation regularization, adversarial training, causal intervention, and trustworthy explanation generation. Contribution/Results: Our work establishes a foundational theoretical framework and actionable design principles for developing secure, robust XAI systems, bridging critical gaps between interpretability, robustness, and safety in AI deployment.

Problem

Research questions and friction points this paper is trying to address.

Explores vulnerabilities of XAI methods to adversarial attacks

Investigates security risks in model explanations and fairness metrics

Seeks defenses against attacks to ensure robust interpretation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey adversarial attacks on XAI explanations

Unified taxonomy for AdvML and XAI research

Defense strategies for robust XAI methods

🔎 Similar Papers

Explainable artificial intelligence: A survey of needs, techniques, applications, and future direction