The Grand Software Supply Chain of AI Systems

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This study addresses the pervasive lack of verifiability, version control, observability, and traceability across the end-to-end AI pipeline—from data acquisition to inference. It proposes the first systematic modeling of the AI software supply chain as a four-layer architecture encompassing data ingestion, model training, model inference, and underlying infrastructure. Through this framework, the work identifies four structural security gaps overlooked by conventional mechanisms: behavioral coupling, insufficient rollback capability, absence of change awareness, and difficulties in lineage tracking. Leveraging software supply chain analysis, dependency resolution, and large-scale empirical measurement, the authors evaluate 48 production-grade open-source projects—comprising 4,664 direct and 11,508 transitive dependencies, totaling approximately 392 million lines of code—thereby quantifying the complexity and scale of latent risks inherent in modern AI supply chains.
📝 Abstract
AI systems rest on software with low integrity mechanisms, leaving AI systems exposed across every stage from data acquisition to final inference. This paper makes the AI supply chain a first-class object of analysis, decomposing it across four architectural layers: data acquisition, model training, model inference, and a cross-cutting substrate. Within these layers, we identify four structural gaps that traditional supply chain mechanisms do not address: verifiability, versioning, observability, and traceability.Current AI systems fall short on all of them: they carry undeclared behavioral couplings that no resolver enforces; they cannot be reverted back to known working assemblies; they degrade silently rather than surfacing breaking changes; and their lineage can hardly be approximated. To illustrate the scale of the software supply chain of AI, we measure a reference stack of 48 production-grade open-source projects, which declares 4,664 direct dependencies, resolves to 11,508 transitive packages, and totals roughly 392M lines of code.
Problem

Research questions and friction points this paper is trying to address.

AI supply chain
verifiability
versioning
observability
traceability
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI supply chain
verifiability
versioning
observability
traceability
🔎 Similar Papers
No similar papers found.