🤖 AI Summary
Existing black-box face authentication systems suffer from limited robustness evaluation against adversarial patch attacks—particularly impersonation attacks—due to low attack success rates, high query overhead, and overly strong assumptions about attacker capabilities. To address these limitations, we propose the first diffusion model-based semantic-level adversarial patch generation framework. Our method introduces interpretable semantic perturbations in the latent space, synergistically integrating attention disruption mechanisms and a targeted feature-space loss function to precisely steer the model toward the target identity. Additionally, we incorporate a black-box query optimization strategy to significantly reduce API access costs. Extensive experiments across multiple mainstream face recognition models demonstrate that our approach achieves an average attack success rate improvement of 45.66% (all exceeding 40%), while reducing query counts by approximately 40%, outperforming state-of-the-art methods by a substantial margin.
📝 Abstract
Given the need to evaluate the robustness of face recognition (FR) models, many efforts have focused on adversarial patch attacks that mislead FR models by introducing localized perturbations. Impersonation attacks are a significant threat because adversarial perturbations allow attackers to disguise themselves as legitimate users. This can lead to severe consequences, including data breaches, system damage, and misuse of resources. However, research on such attacks in FR remains limited. Existing adversarial patch generation methods exhibit limited efficacy in impersonation attacks due to (1) the need for high attacker capabilities, (2) low attack success rates, and (3) excessive query requirements. To address these challenges, we propose a novel method SAP-DIFF that leverages diffusion models to generate adversarial patches via semantic perturbations in the latent space rather than direct pixel manipulation. We introduce an attention disruption mechanism to generate features unrelated to the original face, facilitating the creation of adversarial samples and a directional loss function to guide perturbations toward the target identity feature space, thereby enhancing attack effectiveness and efficiency. Extensive experiments on popular FR models and datasets demonstrate that our method outperforms state-of-the-art approaches, achieving an average attack success rate improvement of 45.66% (all exceeding 40%), and a reduction in the number of queries by about 40% compared to the SOTA approach