🤖 AI Summary
This work addresses the limitations of existing image attackability metrics, which are typically dependent on target models and lack visual interpretability, thereby hindering effective assessment of image sensitivity to adversarial attacks in black-box settings. To overcome these challenges, the authors propose OTI, a model-agnostic attackability metric that requires no prior knowledge of the target model. OTI leverages semantic segmentation to isolate object regions and integrates texture intensity with the high-frequency characteristics of adversarial perturbations to construct a model-independent vulnerability indicator. This approach is the first to simultaneously achieve model agnosticism and visual interpretability in attackability evaluation, and it theoretically establishes a connection between texture intensity and decision boundary sensitivity. Experimental results demonstrate that OTI accurately and efficiently predicts image attackability across diverse attack scenarios while providing intuitive visual explanations.
📝 Abstract
Despite the tremendous success of neural networks, benign images can be corrupted by adversarial perturbations to deceive these models. Intriguingly, images differ in their attackability. Specifically, given an attack configuration, some images are easily corrupted, whereas others are more resistant. Evaluating image attackability has important applications in active learning, adversarial training, and attack enhancement. This prompts a growing interest in developing attackability measures. However, existing methods are scarce and suffer from two major limitations: (1) They rely on a model proxy to provide prior knowledge (e.g., gradients or minimal perturbation) to extract model-dependent image features. Unfortunately, in practice, many task-specific models are not readily accessible. (2) Extracted features characterizing image attackability lack visual interpretability, obscuring their direct relationship with the images. To address these, we propose a novel Object Texture Intensity (OTI), a model-free and visually interpretable measure of image attackability, which measures image attackability as the texture intensity of the image's semantic object. Theoretically, we describe the principles of OTI from the perspectives of decision boundaries as well as the mid- and high-frequency characteristics of adversarial perturbations. Comprehensive experiments demonstrate that OTI is effective and computationally efficient. In addition, our OTI provides the adversarial machine learning community with a visual understanding of attackability.