🤖 AI Summary
To address the weak cross-model transferability of adversarial examples in black-box settings, this paper proposes, for the first time, a Bayesian randomization framework jointly modeling the input space and model parameter space. Our method co-designs input posterior approximations and parameter priors, integrating variational inference with stochastic adversarial optimization, while implicitly encouraging joint flat minima in the parameter–input space—thereby significantly enhancing transfer generalization. Crucially, the approach requires no access to or fine-tuning of the target model. Evaluated on ImageNet and CIFAR-10, it achieves transfer attack success rates that surpass the ICLR baseline by 19.14% and 2.08%, respectively, establishing new state-of-the-art performance.
📝 Abstract
This paper presents a substantial extension of our work published at ICLR. Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our empirical findings demonstrate that: 1) the combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability; 2) by introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when comparing with our ICLR basic Bayesian method. We will make our code publicly available.