Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing defenses against adversarial patch attacks rely heavily on prior knowledge of patch location or size, limiting their generalizability to unseen physical patches. Method: This paper proposes a patch-agnostic robust defense framework that leverages Concept Activation Vectors (CAVs) for feature attribution, identifying and suppressing the most perturbation-sensitive semantic concepts in model decisions—without explicitly detecting patch location or dimensions. Contribution/Results: It is the first work to jointly model concept interpretability and robustness, enabling generalized defense against physical patches of arbitrary size and position. Evaluated on the Imagenette dataset using ResNet-50, the method achieves higher robust accuracy and clean accuracy than PatchCleanser, demonstrating superior stability and practicality across diverse scenarios.

Technology Category

Application Category

📝 Abstract

Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.

Problem

Research questions and friction points this paper is trying to address.

Defending against adversarial patch attacks without prior patch knowledge

Suppressing influential concept activations to neutralize patch effects

Achieving robust accuracy across varying patch sizes and locations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages concept-based explanations for defense

Suppresses influential concept activation vectors

Neutralizes patch effects without explicit detection

🔎 Similar Papers

Improving the Robustness of Object Detection and Classification AI models against Adversarial Patch Attacks