🤖 AI Summary
This work investigates the robustness risks of self-supervised learning (SSL)-pretrained vision backbones (ResNet/ViT) under adversarial attacks. Focusing on transferability-induced fragility introduced during fine-tuning, we systematically evaluate 20,000 fine-tuning configurations to uncover latent mechanisms behind robustness degradation. We propose *backbone attack*, a novel black-box method that approximates white-box attack performance using only the frozen backbone—without access to the classifier head. Additionally, we introduce a *surrogate model framework* to quantify how meta-information—specifically architecture, data, and optimization strategy choices—contributes to adversarial transferability. Experiments show that backbone attack significantly outperforms conventional black-box attacks and closely matches white-box performance; surrogate-based attacks achieve high transferability with minimal hyperparameter knowledge; and the relative influence of each fine-tuning dimension is explicitly characterized.
📝 Abstract
Advances in self-supervised learning (SSL) for machine vision have improved representation robustness and model performance, giving rise to pre-trained backbones like emph{ResNet} and emph{ViT} models tuned with SSL methods such as emph{SimCLR}. Due to the computational and data demands of pre-training, the utilization of such backbones becomes a strenuous necessity. However, employing these backbones may inherit vulnerabilities to adversarial attacks. While adversarial robustness has been studied under emph{white-box} and emph{black-box} settings, the robustness of models tuned on pre-trained backbones remains largely unexplored. Additionally, the role of tuning meta-information in mitigating exploitation risks is unclear. This work systematically evaluates the adversarial robustness of such models across $20,000$ combinations of tuning meta-information, including fine-tuning techniques, backbone families, datasets, and attack types. We propose using proxy models to transfer attacks, simulating varying levels of target knowledge by fine-tuning these proxies with diverse configurations. Our findings reveal that proxy-based attacks approach the effectiveness of emph{white-box} methods, even with minimal tuning knowledge. We also introduce a naive"backbone attack,"leveraging only the backbone to generate adversarial samples, which outperforms emph{black-box} attacks and rivals emph{white-box} methods, highlighting critical risks in model-sharing practices. Finally, our ablations reveal how increasing tuning meta-information impacts attack transferability, measuring each meta-information combination.