SAP-DIFF: Semantic Adversarial Patch Generation for Black-Box Face Recognition Models via Diffusion Models

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Existing black-box face authentication systems suffer from limited robustness evaluation against adversarial patch attacks—particularly impersonation attacks—due to low attack success rates, high query overhead, and overly strong assumptions about attacker capabilities. To address these limitations, we propose the first diffusion model-based semantic-level adversarial patch generation framework. Our method introduces interpretable semantic perturbations in the latent space, synergistically integrating attention disruption mechanisms and a targeted feature-space loss function to precisely steer the model toward the target identity. Additionally, we incorporate a black-box query optimization strategy to significantly reduce API access costs. Extensive experiments across multiple mainstream face recognition models demonstrate that our approach achieves an average attack success rate improvement of 45.66% (all exceeding 40%), while reducing query counts by approximately 40%, outperforming state-of-the-art methods by a substantial margin.

Technology Category

Application Category

📝 Abstract

Given the need to evaluate the robustness of face recognition (FR) models, many efforts have focused on adversarial patch attacks that mislead FR models by introducing localized perturbations. Impersonation attacks are a significant threat because adversarial perturbations allow attackers to disguise themselves as legitimate users. This can lead to severe consequences, including data breaches, system damage, and misuse of resources. However, research on such attacks in FR remains limited. Existing adversarial patch generation methods exhibit limited efficacy in impersonation attacks due to (1) the need for high attacker capabilities, (2) low attack success rates, and (3) excessive query requirements. To address these challenges, we propose a novel method SAP-DIFF that leverages diffusion models to generate adversarial patches via semantic perturbations in the latent space rather than direct pixel manipulation. We introduce an attention disruption mechanism to generate features unrelated to the original face, facilitating the creation of adversarial samples and a directional loss function to guide perturbations toward the target identity feature space, thereby enhancing attack effectiveness and efficiency. Extensive experiments on popular FR models and datasets demonstrate that our method outperforms state-of-the-art approaches, achieving an average attack success rate improvement of 45.66% (all exceeding 40%), and a reduction in the number of queries by about 40% compared to the SOTA approach

Problem

Research questions and friction points this paper is trying to address.

Enhances adversarial patch attack effectiveness

Reduces query requirements in face recognition

Improves impersonation attack success rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion models generate adversarial patches

Attention disruption creates unrelated features

Directional loss guides target identity perturbations

🔎 Similar Papers

No similar papers found.