🤖 AI Summary
Current CNN models are vulnerable to physically realizable multi-patch adversarial attacks, while mainstream defenses either assume single-patch scenarios or suffer from high computational overhead and insufficient robustness under multi-patch threats. This paper proposes a lightweight, patch-count-agnostic detection framework: it dynamically generates multiple saliency thresholds from first-layer neural activations and constructs an ensemble of binarized feature maps; subsequently, cluster-level features are extracted via clustering for attack discrimination. By abandoning the fixed-threshold assumption, the method significantly enhances both robustness and detection efficiency against white-box multi-patch attacks. Evaluated on four benchmark datasets, it outperforms state-of-the-art defenses by 11% and 27% in object detection and image classification tasks, respectively.
📝 Abstract
State-of-the-art convolutional neural network models for object detection and image classification are vulnerable to physically realizable adversarial perturbations, such as patch attacks. Existing defenses have focused, implicitly or explicitly, on single-patch attacks, leaving their sensitivity to the number of patches as an open question or rendering them computationally infeasible or inefficient against attacks consisting of multiple patches in the worst cases. In this work, we propose SpaNN, an attack detector whose computational complexity is independent of the expected number of adversarial patches. The key novelty of the proposed detector is that it builds an ensemble of binarized feature maps by applying a set of saliency thresholds to the neural activations of the first convolutional layer of the victim model. It then performs clustering on the ensemble and uses the cluster features as the input to a classifier for attack detection. Contrary to existing detectors, SpaNN does not rely on a fixed saliency threshold for identifying adversarial regions, which makes it robust against white box adversarial attacks. We evaluate SpaNN on four widely used data sets for object detection and classification, and our results show that SpaNN outperforms state-of-the-art defenses by up to 11 and 27 percentage points in the case of object detection and the case of image classification, respectively. Our code is available at https://github.com/gerkbyrd/SpaNN.