🤖 AI Summary
Existing generative adversarial attacks fail to effectively leverage semantic representations from intermediate generator layers, resulting in poor alignment between perturbations and object-salient regions and limiting cross-model transferability. To address this, we propose a Semantic Structure-Aware Generative Attack framework: (1) employing a Mean Teacher mechanism with temporally smoothed features as supervision to enforce semantic consistency between teacher and student networks via feature distillation; and (2) for the first time, exploiting semantically rich early-layer activations of the generator to anchor intermediate modules for progressive perturbation synthesis. Our method achieves significant improvements over state-of-the-art approaches across multiple models, tasks, and domains. We introduce the Adversarial Consistency Ratio (ACR) as a novel metric to quantitatively validate enhanced robustness. This work establishes a new paradigm for generative adversarial attacks—offering both interpretability and superior transferability.
📝 Abstract
Generative adversarial attacks train a perturbation generator on a white-box surrogate model and subsequently apply the crafted perturbations to unseen black-box victim models. In contrast to iterative attacks, these methods deliver superior inference-time efficiency, scalability, and transferability; however, up until now, existing studies have not fully exploited the representational capacity of generative models to preserve and harness semantic information. Specifically, the intermediate activations of the generator encode rich semantic features--object boundaries and coarse shapes--that remain under-exploited, thereby limiting the alignment of perturbations with object-salient regions which are critical for adversarial transferability. To remedy this, we introduce a semantic structure-aware attack framework based on the Mean Teacher, which serves as a temporally smoothed feature reference. With this smoothed reference, we further direct semantic consistency between the early-layer activations in the student and those of the semantically rich teacher by feature distillation. By anchoring perturbation synthesis to the semantically salient early intermediate blocks within the generator based on empirical findings, our method guides progressive adversarial perturbation on regions that substantially enhance adversarial transferability. We conduct extensive experiments over diverse models, domains and tasks to demonstrate consistent improvements relative to state-of-the-art generative attacks, comprehensively evaluated using conventional metrics and our newly proposed Accidental Correction Rate (ACR).