Machine Learning Models Have a Supply Chain Problem

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open machine learning model ecosystems face severe supply chain risks—including malicious model substitution, vulnerable framework dependencies, and restricted or poisoned training data. This paper presents the first systematic identification of three core risk categories in ML model supply chains. To address them, we innovatively adapt the software supply chain security paradigm Sigstore to the ML domain, proposing a lightweight signing and provenance framework built upon cosign and SLSA. Our framework supports binary model signing, traceable training environments, declarative dataset compliance assertions, and embedded metadata verification—enabling strong publisher identity binding and verifiable training provenance. Experimental evaluation demonstrates low runtime overhead, high compatibility with existing ML toolchains, and end-to-end deployability. To our knowledge, this is the first practical, production-ready trust assurance solution for open-source ML supply chains.

Technology Category

Application Category

📝 Abstract
Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
Problem

Research questions and friction points this paper is trying to address.

Open ML models face supply-chain risks like malware substitution
Models may use vulnerable frameworks or poisoned training data
Sigstore can enhance transparency by enabling model signing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Sigstore for ML model transparency
Signing models to verify authenticity
Proving dataset properties for security
🔎 Similar Papers