🤖 AI Summary
To address the lack of adaptation and evolution analysis tools for AI software, this paper proposes the Neural Network Bill of Materials (NNBOM) model—the first framework enabling empirical evolutionary studies across large-scale AI software ecosystems. Leveraging 55,997 open-source PyTorch projects, we construct an NNBOM database that integrates Software Bill of Materials (SBOM) principles with AI-specific artifacts, systematically characterizing long-term evolutionary patterns of pre-trained models and modular components in terms of scale growth, cross-domain dependencies, and reuse practices. Methodologically, we combine empirical software engineering with data mining techniques to uncover AI-specific evolutionary paradigms distinct from traditional software. Our contributions include: (1) the first scalable NNBOM data model and supporting empirical infrastructure; and (2) two prototype tools—a multi-repository collaborative evolution analysis platform and a single-repository component assessment and recommendation system—to aid developer decision-making.
📝 Abstract
Neural networks have become integral to many fields due to their exceptional performance. The open-source community has witnessed a rapid influx of neural network (NN) repositories with fast-paced iterations, making it crucial for practitioners to analyze their evolution to guide development and stay ahead of trends. While extensive research has explored traditional software evolution using Software Bill of Materials (SBOMs), these are ill-suited for NN software, which relies on pre-defined modules and pre-trained models (PTMs) with distinct component structures and reuse patterns. Conceptual AI Bills of Materials (AIBOMs) also lack practical implementations for large-scale evolutionary analysis. To fill this gap, we introduce the Neural Network Bill of Material (NNBOM), a comprehensive dataset construct tailored for NN software. We create a large-scale NNBOM database from 55,997 curated PyTorch GitHub repositories, cataloging their TPLs, PTMs, and modules. Leveraging this database, we conduct a comprehensive empirical study of neural network software evolution across software scale, component reuse, and inter-domain dependency, providing maintainers and developers with a holistic view of its long-term trends. Building on these findings, we develop two prototype applications, extit{Multi repository Evolution Analyzer} and extit{Single repository Component Assessor and Recommender}, to demonstrate the practical value of our analysis.