🤖 AI Summary
Existing robustness evaluation methods predominantly rely on isotropic, global ℓₚ-norm threat models, which fail to capture realistic, semantically meaningful perturbations—such as blur, compression, and occlusion—in vision tasks. To address this limitation, we propose the Projected Displacement (PD) threat model, the first to incorporate *anisotropy* and *locality*: it estimates class-boundary directions locally from training data and projects perturbations onto “unsafe” directions to quantify their hazardous components, thereby decoupling safety from danger. PD requires no pretraining, large-model embeddings, or fine-tuning, and natively supports task-driven, region-sensitive, and concept-level prior integration. Evaluated on ImageNet-1k, PD accurately distinguishes benign distortions (e.g., noise, blur) from hazardous ones (causing label flips), significantly outperforming both ℓₚ-norm and perception-based threat models in robustness assessment fidelity.
📝 Abstract
State-of-the-art machine learning systems are vulnerable to small perturbations to their input, where ``small'' is defined according to a threat model that assigns a positive threat to each perturbation. Most prior works define a task-agnostic, isotropic, and global threat, like the $ell_p$ norm, where the magnitude of the perturbation fully determines the degree of the threat and neither the direction of the attack nor its position in space matter. However, common corruptions in computer vision, such as blur, compression, or occlusions, are not well captured by such threat models. This paper proposes a novel threat model called exttt{Projected Displacement} (PD) to study robustness beyond existing isotropic and global threat models. The proposed threat model measures the threat of a perturbation via its alignment with extit{unsafe directions}, defined as directions in the input space along which a perturbation of sufficient magnitude changes the ground truth class label. Unsafe directions are identified locally for each input based on observed training data. In this way, the PD threat model exhibits anisotropy and locality. Experiments on Imagenet-1k data indicate that, for any input, the set of perturbations with small PD threat includes extit{safe} perturbations of large $ell_p$ norm that preserve the true label, such as noise, blur and compression, while simultaneously excluding extit{unsafe} perturbations that alter the true label. Unlike perceptual threat models based on embeddings of large-vision models, the PD threat model can be readily computed for arbitrary classification tasks without pre-training or finetuning. Further additional task annotation such as sensitivity to image regions or concept hierarchies can be easily integrated into the assessment of threat and thus the PD threat model presents practitioners with a flexible, task-driven threat specification.