🤖 AI Summary
This work addresses the vulnerability of Transformer backbone models to adversarial attacks. We propose Learnable Robustness Tokens—a plug-and-play, lightweight defense module that requires no architectural modification or full-model retraining. Our method introduces trainable attention-guided token embeddings, optimized end-to-end via joint adversarial training, gradient masking, and feature disentanglement. Crucially, it is the first to employ learnable tokens as a general-purpose robustness enhancement mechanism, dynamically injecting salient defensive features into the representation space. Evaluated on ImageNet-C and AutoAttack benchmarks, our approach achieves an average robust accuracy improvement of 12.3%, with less than 0.5% degradation in clean accuracy—substantially outperforming existing plug-and-play defenses.