🤖 AI Summary
This work identifies a novel threat in deep learning model supply chains: self-extracting, self-executing malware propagated via pre-trained third-party neural networks. To address this, we propose MaleficNet 2.0—the first neural network-level payload injection framework achieving simultaneous stealth, functionality, and robustness. Our method embeds executable malicious payloads into model weights using spread-spectrum channel coding and LDPC error-correcting codes, enabling reliable payload storage in a minimal number of parameter bits. We formulate weight perturbations as a constrained optimization problem to guarantee zero accuracy degradation. MaleficNet 2.0 supports both centralized training and distributed settings—including federated learning—without modification. An end-to-end proof-of-concept implementation in PyTorch demonstrates automatic payload extraction and execution during inference. Crucially, the injected malware exhibits strong resilience against common model hardening techniques, including pruning, quantization, and fine-tuning, while preserving full model functionality and fidelity.
📝 Abstract
Training high-quality deep learning models is a challenging task due to computational and technical requirements. A growing number of individuals, institutions, and companies increasingly rely on pre-trained, third-party models made available in public repositories. These models are often used directly or integrated in product pipelines with no particular precautions, since they are effectively just data in tensor form and considered safe. In this paper, we raise awareness of a new machine learning supply chain threat targeting neural networks. We introduce MaleficNet 2.0, a novel technique to embed self-extracting, self-executing malware in neural networks. MaleficNet 2.0 uses spread-spectrum channel coding combined with error correction techniques to inject malicious payloads in the parameters of deep neural networks. MaleficNet 2.0 injection technique is stealthy, does not degrade the performance of the model, and is robust against removal techniques. We design our approach to work both in traditional and distributed learning settings such as Federated Learning, and demonstrate that it is effective even when a reduced number of bits is used for the model parameters. Finally, we implement a proof-of-concept self-extracting neural network malware using MaleficNet 2.0, demonstrating the practicality of the attack against a widely adopted machine learning framework. Our aim with this work is to raise awareness against these new, dangerous attacks both in the research community and industry, and we hope to encourage further research in mitigation techniques against such threats.