Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the abrupt drop in robustness during fast adversarial training, commonly attributed to catastrophic overfitting. The study offers a novel interpretation of this phenomenon through the lens of backdoor mechanisms, framing it as an unlearnable task induced by weak trigger patterns, and unifies it within a theoretical framework encompassing both backdoor attacks and unlearnable examples. To validate this perspective, the authors introduce several analytical tools—including path partitioning, feature prediction discrepancy analysis, and a universal class-discriminative trigger—and propose backdoor-inspired mitigation strategies such as vanilla fine-tuning, linear probing, weight re-initialization, and constraints suppressing weight outliers. Experimental results demonstrate that the proposed approaches not only substantiate the theoretical explanation but also significantly alleviate catastrophic overfitting and enhance model robustness across diverse adversarial attacks.

Technology Category

Application Category

📝 Abstract

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.

Problem

Research questions and friction points this paper is trying to address.

catastrophic overfitting

fast adversarial training

backdoor mechanism

adversarial robustness

unlearnable tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

catastrophic overfitting

backdoor mechanism

fast adversarial training