🤖 AI Summary
This work proposes a novel backdoor attack mechanism that exploits subtle numerical discrepancies introduced by different hardware during inference to trigger targeted prediction flips. By integrating numerical error analysis, decision boundary manipulation, and adversarial training, the method precisely tailors model behavior so that identical inputs yield divergent predictions on specific hardware platforms. The attack’s reliability is empirically validated across multiple mainstream GPU accelerators, and a systematic evaluation demonstrates the limitations of current defense strategies. This study represents the first demonstration of a hardware-dependent backdoor attack, significantly expanding the threat landscape for machine learning model security by revealing how hardware-induced numerical variations can be weaponized to compromise model integrity.
📝 Abstract
Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can lead to small numerical variations during inference. In this work, we show that these variations can be exploited to create backdoors in machine learning models. The core idea is to shape the model's decision function such that it yields different predictions for the same input when executed on different hardware. This effect is achieved by locally moving the decision boundary close to a target input and then refining numerical deviations to flip the prediction on selected hardware. We empirically demonstrate that these hardware-triggered backdoors can be created reliably across common GPU accelerators. Our findings reveal a novel attack vector affecting the use of third-party models, and we investigate different defenses to counter this threat.