π€ AI Summary
In distributed deep learning, attackers can mount effective evasion attacks by intercepting intermediate features exchanged between edge and cloud devicesβeven when both models are inaccessible (i.e., black-box). This paper is the first to reveal the strong cross-device transferability of such intermediate features. We propose a statistical-driven proxy distillation method that requires no model access: it reconstructs the original tensor shape from the statistical distribution of intercepted features and employs an adaptive lightweight proxy architecture to enable efficient feature distillation and adversarial example generation. Experiments across multiple mainstream distributed architectures demonstrate that our approach significantly improves transfer-based attack success rates (average +28.7%). It uncovers a novel systemic security risk arising from intermediate feature leakage and establishes a new paradigm for security assessment in distributed AI systems.
π Abstract
As machine learning models become increasingly deployed across the edge of internet of things environments, a partitioned deep learning paradigm in which models are split across multiple computational nodes introduces a new dimension of security risk. Unlike traditional inference setups, these distributed pipelines span the model computation across heterogeneous nodes and communication layers, thereby exposing a broader attack surface to potential adversaries. Building on these motivations, this work explores a previously overlooked vulnerability: even when both the edge and cloud components of the model are inaccessible (i.e., black-box), an adversary who intercepts the intermediate features transmitted between them can still pose a serious threat. We demonstrate that, under these mild and realistic assumptions, an attacker can craft highly transferable proxy models, making the entire deep learning system significantly more vulnerable to evasion attacks. In particular, the intercepted features can be effectively analyzed and leveraged to distill surrogate models capable of crafting highly transferable adversarial examples against the target model. To this end, we propose an exploitation strategy specifically designed for distributed settings, which involves reconstructing the original tensor shape from vectorized transmitted features using simple statistical analysis, and adapting surrogate architectures accordingly to enable effective feature distillation. A comprehensive and systematic experimental evaluation has been conducted to demonstrate that surrogate models trained with the proposed strategy, i.e., leveraging intermediate features, tremendously improve the transferability of adversarial attacks. These findings underscore the urgent need to account for intermediate feature leakage in the design of secure distributed deep learning systems.