🤖 AI Summary
This work investigates the theoretical mechanisms underlying cross-model transferability of adversarial examples in ensemble-based adversarial attacks. Method: We introduce the notion of “transfer error” and rigorously decompose it— for the first time—into two quantifiable components: model vulnerability and ensemble diversity. Leveraging Rademacher complexity and information-theoretic analysis, we derive a tight upper bound on transfer error and extract three actionable guidelines for error suppression. Our approach integrates theoretical analysis with multi-model collaborative optimization. Results: Extensive evaluation across 54 heterogeneous models demonstrates that the proposed framework significantly improves adversarial transfer success rates. It establishes a novel paradigm for deep learning robustness assessment, uniquely bridging theoretical rigor with practical deployability.
📝 Abstract
Model ensemble adversarial attack has become a powerful method for generating transferable adversarial examples that can target even unknown models, but its theoretical foundation remains underexplored. To address this gap, we provide early theoretical insights that serve as a roadmap for advancing model ensemble adversarial attack. We first define transferability error to measure the error in adversarial transferability, alongside concepts of diversity and empirical model ensemble Rademacher complexity. We then decompose the transferability error into vulnerability, diversity, and a constant, which rigidly explains the origin of transferability error in model ensemble attack: the vulnerability of an adversarial example to ensemble components, and the diversity of ensemble components. Furthermore, we apply the latest mathematical tools in information theory to bound the transferability error using complexity and generalization terms, contributing to three practical guidelines for reducing transferability error: (1) incorporating more surrogate models, (2) increasing their diversity, and (3) reducing their complexity in cases of overfitting. Finally, extensive experiments with 54 models validate our theoretical framework, representing a significant step forward in understanding transferable model ensemble adversarial attacks.