🤖 AI Summary
Existing reversible adversarial attack methods are largely restricted to white-box settings, suffering from poor transferability, high query complexity, and limited practicality in black-box scenarios. To address these challenges for image privacy protection, this paper proposes a two-stage fusion framework: (1) generating a highly transferable initial perturbation under white-box conditions; and (2) refining it via a memory-augmented black-box optimization strategy for efficient perturbation tuning. To the best of our knowledge, this is the first work to successfully deploy reversible adversarial attacks on real-world commercial models while preserving both strong attack efficacy and perfect image recoverability. Experiments demonstrate a black-box attack success rate of 99.0% and 100% original image recovery—substantially outperforming prior black-box reversible methods. Our approach establishes a novel paradigm for practical, dynamic privacy protection in realistic deployment environments.
📝 Abstract
In the field of digital security, Reversible Adversarial Examples (RAE) combine adversarial attacks with reversible data hiding techniques to effectively protect sensitive data and prevent unauthorized analysis by malicious Deep Neural Networks (DNNs). However, existing RAE techniques primarily focus on white-box attacks, lacking a comprehensive evaluation of their effectiveness in black-box scenarios. This limitation impedes their broader deployment in complex, dynamic environments. Further more, traditional black-box attacks are often characterized by poor transferability and high query costs, significantly limiting their practical applicability. To address these challenges, we propose the Dual-Phase Merging Transferable Reversible Attack method, which generates highly transferable initial adversarial perturbations in a white-box model and employs a memory augmented black-box strategy to effectively mislead target mod els. Experimental results demonstrate the superiority of our approach, achieving a 99.0% attack success rate and 100% recovery rate in black-box scenarios, highlighting its robustness in privacy protection. Moreover, we successfully implemented a black-box attack on a commercial model, further substantiating the potential of this approach for practical use.