🤖 AI Summary
Open-source ecosystems face challenges in third-party package selection—including the absence of standardized evaluation criteria, hidden dependency risks, and insufficient transparency in AI-driven recommendations. To address these issues, we propose PySelect: the first AI-assisted package selection system integrating a Multi-Criteria Decision-Making (MCDM) framework with an empirically grounded, dynamic knowledge graph. PySelect automatically constructs this graph by fusing heterogeneous metadata from PyPI, GitHub, and Stack Overflow—encompassing security vulnerabilities, usage trends, developer sentiment, and code-level dependencies. It leverages large language models to interpret user requirements and generate context-aware, interpretable, and reproducible recommendations. Evaluated on 800,000 real-world Python scripts, PySelect achieves high metadata extraction accuracy and significantly outperforms state-of-the-art generative AI baselines in recommendation quality. A Technology Acceptance Model (TAM)-based user study confirms its high usability and practical utility.
📝 Abstract
Selecting third-party software packages in open-source ecosystems like Python is challenging due to the large number of alternatives and limited transparent evidence for comparison. Generative AI tools are increasingly used in development workflows, but their suggestions often overlook dependency evaluation, emphasize popularity over suitability, and lack reproducibility. This creates risks for projects that require transparency, long-term reliability, maintainability, and informed architectural decisions. This study formulates software package selection as a Multi-Criteria Decision-Making (MCDM) problem and proposes a data-driven framework for technology evaluation. Automated data pipelines continuously collect and integrate software metadata, usage trends, vulnerability information, and developer sentiment from GitHub, PyPI, and Stack Overflow. These data are structured into a decision model representing relationships among packages, domain features, and quality attributes. The framework is implemented in PySelect, a decision support system that uses large language models to interpret user intent and query the model to identify contextually appropriate packages. The approach is evaluated using 798,669 Python scripts from 16,887 GitHub repositories and a user study based on the Technology Acceptance Model. Results show high data extraction precision, improved recommendation quality over generative AI baselines, and positive user evaluations of usefulness and ease of use. This work introduces a scalable, interpretable, and reproducible framework that supports evidence-based software selection using MCDM principles, empirical data, and AI-assisted intent modeling.