🤖 AI Summary
In black-box adversarial attacks, increasing the number of surrogate models improves transferability but degrades computational efficiency, making it challenging to balance both objectives. To address this, we propose the Dynamic Diversity Ensemble Attack (DDEA) framework—the first to decouple surrogate model selection *within* and *across* iterations, thereby relaxing the implicit assumption that the same surrogates are used in every iteration. DDEA introduces a gradient-guided diversity metric and a lightweight dynamic selection mechanism, enabling adaptive construction of diverse surrogate ensembles across iterations under per-iteration model count constraints. Evaluated on ImageNet, DDEA achieves an 8.5% higher transfer success rate than state-of-the-art methods. Moreover, it significantly enhances attack effectiveness against commercial vision APIs and large vision-language models. Our approach jointly optimizes transferability and computational efficiency without compromising either.
📝 Abstract
In surrogate ensemble attacks, using more surrogate models yields higher transferability but lower resource efficiency. This practical trade-off between transferability and efficiency has largely limited existing attacks despite many pre-trained models are easily accessible online. In this paper, we argue that such a trade-off is caused by an unnecessary common assumption, i.e., all models should be identical across iterations. By lifting this assumption, we can use as many surrogates as we want to unleash transferability without sacrificing efficiency. Concretely, we propose Selective Ensemble Attack (SEA), which dynamically selects diverse models (from easily accessible pre-trained models) across iterations based on our new interpretation of decoupling within-iteration and cross-iteration model diversity.In this way, the number of within-iteration models is fixed for maintaining efficiency, while only cross-iteration model diversity is increased for higher transferability. Experiments on ImageNet demonstrate the superiority of SEA in various scenarios. For example, when dynamically selecting 4 from 20 accessible models, SEA yields 8.5% higher transferability than existing attacks under the same efficiency. The superiority of SEA also generalizes to real-world systems, such as commercial vision APIs and large vision-language models. Overall, SEA opens up the possibility of adaptively balancing transferability and efficiency according to specific resource requirements.