🤖 AI Summary
This study addresses the pervasive lack of verifiability, version control, observability, and traceability across the end-to-end AI pipeline—from data acquisition to inference. It proposes the first systematic modeling of the AI software supply chain as a four-layer architecture encompassing data ingestion, model training, model inference, and underlying infrastructure. Through this framework, the work identifies four structural security gaps overlooked by conventional mechanisms: behavioral coupling, insufficient rollback capability, absence of change awareness, and difficulties in lineage tracking. Leveraging software supply chain analysis, dependency resolution, and large-scale empirical measurement, the authors evaluate 48 production-grade open-source projects—comprising 4,664 direct and 11,508 transitive dependencies, totaling approximately 392 million lines of code—thereby quantifying the complexity and scale of latent risks inherent in modern AI supply chains.
📝 Abstract
AI systems rest on software with low integrity mechanisms, leaving AI systems exposed across every stage from data acquisition to final inference. This paper makes the AI supply chain a first-class object of analysis, decomposing it across four architectural layers: data acquisition, model training, model inference, and a cross-cutting substrate. Within these layers, we identify four structural gaps that traditional supply chain mechanisms do not address: verifiability, versioning, observability, and traceability.Current AI systems fall short on all of them: they carry undeclared behavioral couplings that no resolver enforces; they cannot be reverted back to known working assemblies; they degrade silently rather than surfacing breaking changes; and their lineage can hardly be approximated. To illustrate the scale of the software supply chain of AI, we measure a reference stack of 48 production-grade open-source projects, which declares 4,664 direct dependencies, resolves to 11,508 transitive packages, and totals roughly 392M lines of code.