Adapting to Evolving Adversaries with Regularized Continual Robust Training

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation of model robustness under continually emerging adversarial attacks, this paper proposes the Continual Robust Training (CRT) framework. CRT theoretically establishes a robustness transfer boundary based on perturbation distances in the logit space—the first such characterization—and designs a lightweight distance regularization mechanism that jointly enhances robustness against both historical and novel attacks without significant computational overhead. The framework is compatible with ℓₚ-based adversarial training and multi-stage fine-tuning, requiring no architectural modifications. Evaluated across over 100 attack combinations on CIFAR-10, CIFAR-100, and ImageNette, CRT achieves substantial improvements in cross-attack robust accuracy while incurring negligible increases in training cost. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Robust training methods typically defend against specific attack types, such as Lp attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended model to new adversaries as they arise via fine-tuning, a method which we call continual robust training (CRT). However, when implemented naively, fine-tuning on new attacks degrades robustness on previous attacks. This raises the question: how can we improve the initial training and fine-tuning of the model to simultaneously achieve robustness against previous and new attacks? We present theoretical results which show that the gap in a model's robustness against different attacks is bounded by how far each attack perturbs a sample in the model's logit space, suggesting that regularizing with respect to this logit space distance can help maintain robustness against previous attacks. Extensive experiments on 3 datasets (CIFAR-10, CIFAR-100, and ImageNette) and over 100 attack combinations demonstrate that the proposed regularization improves robust accuracy with little overhead in training time. Our findings and open-source code lay the groundwork for the deployment of models robust to evolving attacks.
Problem

Research questions and friction points this paper is trying to address.

Adapting to evolving adversarial attacks
Maintaining robustness against old and new attacks
Regularizing with logit space distance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual robust training against evolving attacks
Regularization based on logit space distance
Improved robustness across multiple datasets
🔎 Similar Papers
No similar papers found.
S
Sihui Dai
Department of Electrical and Computer Engineering, Princeton University
C
Christian Cianfarani
Department of Computer Science, University of Chicago
A
A. Bhagoji
Department of Computer Science, University of Chicago
Vikash Sehwag
Vikash Sehwag
Google Deepmind; Princeton University
Multimodal AIRLHFAI Safety & AlignmentSecurity & Privacy
Prateek Mittal
Prateek Mittal
Professor, Princeton University
Security and PrivacySystems and NetworkingMachine Learning