Learning Robustness at Test-Time from a Non-Robust Teacher

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of enhancing the adversarial robustness of non-robust pre-trained models on a target distribution using only a small amount of unlabeled target data at test time. To this end, the authors propose an unsupervised test-time adaptation framework that leverages predictions from a non-robust teacher model to construct semantic anchors, replacing conventional self-consistency constraints. The method jointly optimizes objectives for both clean and adversarial examples and is supported by theoretical analysis demonstrating its improved stability. Integrating adversarial training, knowledge distillation, and unsupervised learning, the approach is validated on CIFAR-10 and ImageNet under photometric perturbations, exhibiting superior optimization stability, reduced sensitivity to hyperparameters, and a better trade-off between robustness and accuracy.

Technology Category

Application Category

📝 Abstract

Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.

Problem

Research questions and friction points this paper is trying to address.

test-time adaptation

adversarial robustness

non-robust teacher

unsupervised adaptation

pretrained models

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation

adversarial robustness

non-robust teacher