🤖 AI Summary
To address teacher model degradation and inefficient robust knowledge transfer in adversarial distillation, this paper proposes AdaGAT—a novel adaptive guided adversarial distillation framework. Its core innovation lies in a dual-separate loss function that dynamically regulates the training state of a learnable teacher model, enabling it to maintain optimal robustness throughout co-training with the student and actively participate in gradient optimization. By integrating adversarial distillation with an adaptive gradient guidance mechanism, AdaGAT is evaluated on WideResNet-34-10 across CIFAR-10, CIFAR-100, and TinyImageNet. Experiments demonstrate that AdaGAT significantly improves student model robustness against strong adversarial attacks—including PGD and AutoAttack—achieving average gains of 2.1–4.7 percentage points, while preserving high natural accuracy. The method consistently outperforms state-of-the-art adversarial distillation baselines across all benchmarks.
📝 Abstract
Adversarial distillation (AD) is a knowledge distillation technique that facilitates the transfer of robustness from teacher deep neural network (DNN) models to lightweight target (student) DNN models, enabling the target models to perform better than only training the student model independently. Some previous works focus on using a small, learnable teacher (guide) model to improve the robustness of a student model. Since a learnable guide model starts learning from scratch, maintaining its optimal state for effective knowledge transfer during co-training is challenging. Therefore, we propose a novel Adaptive Guidance Adversarial Training (AdaGAT) method. Our method, AdaGAT, dynamically adjusts the training state of the guide model to install robustness to the target model. Specifically, we develop two separate loss functions as part of the AdaGAT method, allowing the guide model to participate more actively in backpropagation to achieve its optimal state. We evaluated our approach via extensive experiments on three datasets: CIFAR-10, CIFAR-100, and TinyImageNet, using the WideResNet-34-10 model as the target model. Our observations reveal that appropriately adjusting the guide model within a certain accuracy range enhances the target model's robustness across various adversarial attacks compared to a variety of baseline models.