๐ค AI Summary
This study empirically characterizes, for the first time, the full lifecycle practices of pre-trained models (PTMs) in open-source software (OSS), focusing on integration, evolution, testing, and maintenance challenges. We conduct large-scale mining of GitHub repositories, coupled with cross-platform PTM dependency tracing (Hugging Face, PyTorch Hub), historical commit and issue log analysis, and model metadata parsing. Our analysis systematically identifies recurring risksโincluding dependency staleness, inadequate documentation, and insufficient test coverage. We propose a novel software engineering analysis framework specifically designed for model dependencies, addressing critical gaps in PTM operationalization and sustainability research. As concrete outcomes, we deliver a reusable PTM maintenance practice guide and a prototype detection tool. These contributions provide both theoretical foundations and practical support for enhancing the maintainability and engineering rigor of AI models within software systems.
๐ Abstract
Pre-trained models (PTMs) are becoming a common component in open-source software (OSS) development, yet their roles, maintenance practices, and lifecycle challenges remain underexplored. This report presents a plan for an exploratory study to investigate how PTMs are utilized, maintained, and tested in OSS projects, focusing on models hosted on platforms like Hugging Face and PyTorch Hub. We plan to explore how PTMs are used in open-source software projects and their related maintenance practices by mining software repositories that use PTMs and analyzing their code-base, historical data, and reported issues. This study aims to provide actionable insights into improving the use and sustainability of PTM in open-source projects and a step towards a foundation for advancing software engineering practices in the context of model dependencies.