Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing white-box attacks often resort to approximate gradients when evaluating iterative stochastic purification defenses due to memory constraints, which weakens attack strength and leads to an overestimation of model robustness. This work proposes a memory-efficient full-gradient attack framework that integrates gradient checkpointing with a controllable randomness protocol, enabling—for the first time—exact end-to-end white-box attacks against long-trajectory stochastic defenses such as diffusion- and Langevin-based purification. The method achieves state-of-the-art attack performance under both ℓ∞ and ℓ₂ norms, uncovers vulnerabilities missed by approximate-gradient approaches, and facilitates out-of-distribution robustness analysis, thereby substantially improving the reliability of robustness evaluation.

📝 Abstract

This work studies the robust evaluation of iterative stochastic purification defenses under white-box adversarial attacks. Our key technical insight is that gradient checkpointing makes exact end-to-end gradient computation through long purification trajectories practical by trading additional recomputation for substantially lower memory usage. This enables full-gradient adaptive attacks against diffusion- and Langevin-based purification defenses, where prior evaluations often resort to approximate backpropagation due to memory constraints. These approximations can weaken the attack signal and risk overestimating robustness. In parallel, stochasticity in iterative purification is frequently under-controlled, even though different purification trajectories can substantially change reported robustness metrics. Building on this insight, we introduce a memory-efficient full-gradient evaluation framework for stochastic purification defenses. The framework combines checkpointed backpropagation with evaluation protocols that control stochastic variability, thereby reducing memory bottlenecks while preserving exact gradients. We evaluate diffusion-based purification and Langevin sampling with Energy-Based Models (EBMs), demonstrating that full-gradient attacks uncover vulnerabilities missed by approximate-gradient evaluations. Our framework yields stronger state-of-the-art $\ell_{\infty}$ and $\ell_{2}$ white-box attacks and further supports probing out-of-distribution robustness. Overall, our results show that exact-gradient evaluation is essential for reliable benchmarking of iterative stochastic defenses.

Problem

Research questions and friction points this paper is trying to address.

adversarial defense evaluation

stochastic purification

gradient approximation

memory bottleneck

robustness overestimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient checkpointing

full-gradient attack

stochastic purification