SAP-DIFF: Semantic Adversarial Patch Generation for Black-Box Face Recognition Models via Diffusion Models

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing black-box face authentication systems suffer from limited robustness evaluation against adversarial patch attacks—particularly impersonation attacks—due to low attack success rates, high query overhead, and overly strong assumptions about attacker capabilities. To address these limitations, we propose the first diffusion model-based semantic-level adversarial patch generation framework. Our method introduces interpretable semantic perturbations in the latent space, synergistically integrating attention disruption mechanisms and a targeted feature-space loss function to precisely steer the model toward the target identity. Additionally, we incorporate a black-box query optimization strategy to significantly reduce API access costs. Extensive experiments across multiple mainstream face recognition models demonstrate that our approach achieves an average attack success rate improvement of 45.66% (all exceeding 40%), while reducing query counts by approximately 40%, outperforming state-of-the-art methods by a substantial margin.

Technology Category

Application Category

📝 Abstract
Given the need to evaluate the robustness of face recognition (FR) models, many efforts have focused on adversarial patch attacks that mislead FR models by introducing localized perturbations. Impersonation attacks are a significant threat because adversarial perturbations allow attackers to disguise themselves as legitimate users. This can lead to severe consequences, including data breaches, system damage, and misuse of resources. However, research on such attacks in FR remains limited. Existing adversarial patch generation methods exhibit limited efficacy in impersonation attacks due to (1) the need for high attacker capabilities, (2) low attack success rates, and (3) excessive query requirements. To address these challenges, we propose a novel method SAP-DIFF that leverages diffusion models to generate adversarial patches via semantic perturbations in the latent space rather than direct pixel manipulation. We introduce an attention disruption mechanism to generate features unrelated to the original face, facilitating the creation of adversarial samples and a directional loss function to guide perturbations toward the target identity feature space, thereby enhancing attack effectiveness and efficiency. Extensive experiments on popular FR models and datasets demonstrate that our method outperforms state-of-the-art approaches, achieving an average attack success rate improvement of 45.66% (all exceeding 40%), and a reduction in the number of queries by about 40% compared to the SOTA approach
Problem

Research questions and friction points this paper is trying to address.

Enhances adversarial patch attack effectiveness
Reduces query requirements in face recognition
Improves impersonation attack success rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion models generate adversarial patches
Attention disruption creates unrelated features
Directional loss guides target identity perturbations
🔎 Similar Papers
No similar papers found.
Mingsi Wang
Mingsi Wang
Institute of Information Engineering, Chinese Academy of Sciences
AI security
S
Shuaiyin Yao
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
C
Chang Yue
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Lijie Zhang
Lijie Zhang
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Guozhu Meng
Guozhu Meng
Associate Professor with Chinese Academy of Sciences
mobile securityprogram analysisAI privacy and security