🤖 AI Summary
Foundational open-source software libraries—such as NumPy and R base—that underpin biomedical research tools are frequently overlooked, with their contributions inadequately quantified and unrecognized in research policy. Method: We construct a cross-ecosystem (PyPI, CRAN, Bioconductor) software dependency graph, leveraging the CZ Software Mentions Dataset to identify empirically observed dependencies in real-world scientific workflows. We propose the first centrality assessment framework tailored to software ecosystems, integrating graph-theoretic metrics—including betweenness and closeness centrality—to systematically quantify the structural importance of foundational libraries. Contribution/Results: Our analysis uncovers several “hidden hero” libraries whose critical role is substantiated by reproducible, data-driven evidence. This work provides an empirical foundation for funding agencies to prioritize investment in essential software infrastructure, thereby strengthening the sustainability and reliability of the biomedical research ecosystem.
📝 Abstract
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.