🤖 AI Summary
Deep neural networks suffer from insufficient robustness against adversarial perturbations, and existing robust training paradigms lack unification, reproducibility, and theoretical guarantees. To address this, we propose a novel end-to-end differentiable robust training framework: it introduces an input-adaptive, learnable perturbation generator embedded directly into the training loop—eliminating reliance on pre-specified attack types. Our method integrates gradient-driven bilevel optimization, implicit differentiation, and Wasserstein-based adversarial constraints to enable dynamic, projection-based perturbation updates. We provide rigorous theoretical analysis proving convergence and establishing tighter robust generalization bounds. Empirically, on CIFAR-10/100 and ImageNet subsets, our approach achieves an average 5.2% improvement in robust accuracy against strong attacks (e.g., PGD, AutoAttack), while incurring ≤0.8% degradation in standard (clean-data) accuracy.