🤖 AI Summary
Diffusion-based adversarial examples suffer from poor transferability and limited task generalization. To address this, we propose the first unified framework integrating latent-space optimization with transferability enhancement, enabling image-editing-style generation of highly imperceptible, cross-task adversarial examples. Our approach innovatively incorporates conventional transfer-enhancing strategies—such as momentum-based iterative optimization and ensemble gradient computation—directly into the diffusion latent-space optimization pipeline, while jointly optimizing a fully differentiable, end-to-end editable perturbation generation mechanism. Experimental results demonstrate substantial improvements in cross-model attack success rates on non-standard classification tasks, notably Deepfake detection. The method achieved first place in the ACM MM ’25 Deepfake Detection Adversarial Challenge, validating its effectiveness and superior generalization capability in real-world scenarios.
📝 Abstract
Due to their powerful image generation capabilities, diffusion-based adversarial example generation methods through image editing are rapidly gaining popularity. However, due to reliance on the discriminative capability of the diffusion model, these diffusion-based methods often struggle to generalize beyond conventional image classification tasks, such as in Deepfake detection. Moreover, traditional strategies for enhancing adversarial example transferability are challenging to adapt to these methods. To address these challenges, we propose a unified framework that seamlessly incorporates traditional transferability enhancement strategies into diffusion model-based adversarial example generation via image editing, enabling their application across a wider range of downstream tasks. Our method won first place in the "1st Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media" competition at ACM MM25, which validates the effectiveness of our approach.