🤖 AI Summary
Existing defenses against adversarial patch attacks rely heavily on prior knowledge of patch location or size, limiting their generalizability to unseen physical patches. Method: This paper proposes a patch-agnostic robust defense framework that leverages Concept Activation Vectors (CAVs) for feature attribution, identifying and suppressing the most perturbation-sensitive semantic concepts in model decisions—without explicitly detecting patch location or dimensions. Contribution/Results: It is the first work to jointly model concept interpretability and robustness, enabling generalized defense against physical patches of arbitrary size and position. Evaluated on the Imagenette dataset using ResNet-50, the method achieves higher robust accuracy and clean accuracy than PatchCleanser, demonstrating superior stability and practicality across diverse scenarios.
📝 Abstract
Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.