Machine Learning Models Have a Supply Chain Problem

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Open machine learning model ecosystems face severe supply chain risks—including malicious model substitution, vulnerable framework dependencies, and restricted or poisoned training data. This paper presents the first systematic identification of three core risk categories in ML model supply chains. To address them, we innovatively adapt the software supply chain security paradigm Sigstore to the ML domain, proposing a lightweight signing and provenance framework built upon cosign and SLSA. Our framework supports binary model signing, traceable training environments, declarative dataset compliance assertions, and embedded metadata verification—enabling strong publisher identity binding and verifiable training provenance. Experimental evaluation demonstrates low runtime overhead, high compatibility with existing ML toolchains, and end-to-end deployability. To our knowledge, this is the first practical, production-ready trust assurance solution for open-source ML supply chains.

Technology Category

Application Category

📝 Abstract

Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.

Problem

Research questions and friction points this paper is trying to address.

Open ML models face supply-chain risks like malware substitution

Models may use vulnerable frameworks or poisoned training data

Sigstore can enhance transparency by enabling model signing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Sigstore for ML model transparency

Signing models to verify authenticity

Proving dataset properties for security

🔎 Similar Papers

Enhancing supply chain security with automated machine learning