🤖 AI Summary
This work addresses the vulnerability of cross-camera face recognition systems to adversarial evasion and impersonation attacks by proposing a conditional encoder-decoder framework for generating adversarial patches. By fusing multi-scale features from both source and target images, the method simultaneously achieves efficient evasion and impersonation in a single forward pass, while leveraging a pre-trained latent diffusion model to enhance the visual realism of the patches for physical-world deployment. The approach innovatively incorporates a push-pull dual-objective optimization mechanism and employs activation map clustering to uncover the critical facial features exploited by the attack. Experimental results demonstrate that the proposed method reduces mean average precision (mAP) to 0.4% under both white-box and black-box settings, exhibits strong cross-model generalization, and achieves a 27% impersonation success rate on CelebA-HQ, significantly outperforming existing approaches.
📝 Abstract
Facial identification systems are increasingly deployed in surveillance and yet their vulnerability to adversarial evasion and impersonation attacks pose a critical risk. This paper introduces a novel framework for generating adversarial patches capable of both evasion and impersonation attacks against deep re-identification models across non-overlapping cameras. Unlike prior approaches that require iterative patch optimisation for each target, our method employs a conditional encoder-decoder network to synthesize adversarial patches in a single forward pass, guided by multi-scale features from source and target images. The patches are optimised with a dual adversarial objective comprising of pull and push terms. To enhance imperceptibility and aid physical deployment, we further integrate naturalistic patch generation using pre-trained latent diffusion models. Experiments on standard pedestrian (Market-1501, DukeMTMCreID) and facial recognition benchmarks (CelebA-HQ, PubFig) datasets demonstrate the effectiveness of the proposed method. Our adversarial evasion attacks reduce mean Average Precision from 90% to 0.4% in white-box settings and from 72% to 0.4% in black-box settings, showing strong cross-model generalization. In targeted impersonation attacks, our framework achieves a success rate of 27% on CelebA-HQ, competing with other patch-based methods. We go further to use clustering of activation maps to interpret which features are most used by adversarial attacks and propose a pathway for future countermeasures. The results highlight the practicality of adversarial patch attacks on retrieval-based systems and underline the urgent need for robust defense strategies.