🤖 AI Summary
Fast adversarial attacks like FGSM are widely considered unsuitable for robust fine-tuning—particularly due to their tendency toward catastrophic overfitting early in training. Method: This work systematically investigates whether FGSM-based adversarial fine-tuning can serve as a low-cost, scalable alternative to expensive PGD-based fine-tuning in robust transfer learning, focusing on parameter-efficient settings such as linear probing. Contribution/Results: Contrary to conventional wisdom, we demonstrate that FGSM fine-tuning exhibits remarkable stability and avoids catastrophic overfitting even under high perturbation magnitudes (ε = 4/8). Empirical evaluation across multiple benchmarks shows that FGSM fine-tuning incurs only marginal robust accuracy drops—0.39% and 1.39%—compared to PGD, while reducing training time by 75%. To our knowledge, this is the first study to establish FGSM’s robustness and scalability in adversarial fine-tuning, establishing a new paradigm for lightweight robust transfer learning.
📝 Abstract
Transfer learning is often used to decrease the computational cost of model training, as fine-tuning a model allows a downstream task to leverage the features learned from the pre-training dataset and quickly adapt them to a new task. This is particularly useful for achieving adversarial robustness, as adversarially training models from scratch is very computationally expensive. However, high robustness in transfer learning still requires adversarial training during the fine-tuning phase, which requires up to an order of magnitude more time than standard fine-tuning. In this work, we revisit the use of the fast gradient sign method (FGSM) in robust transfer learning to improve the computational cost of adversarial fine-tuning. We surprisingly find that FGSM is much more stable in adversarial fine-tuning than when training from scratch. In particular, FGSM fine-tuning does not suffer from any issues with catastrophic overfitting at standard perturbation budgets of $varepsilon=4$ or $varepsilon=8$. This stability is further enhanced with parameter-efficient fine-tuning methods, where FGSM remains stable even up to $varepsilon=32$ for linear probing. We demonstrate how this stability translates into performance across multiple datasets. Compared to fine-tuning with the more commonly used method of projected gradient descent (PGD), on average, FGSM only loses 0.39% and 1.39% test robustness for $varepsilon=4$ and $varepsilon=8$ while using $4 imes$ less training time. Surprisingly, FGSM may not only be a significantly more efficient alternative to PGD in adversarially robust transfer learning but also a well-performing one.