Biomedical Open Source Software: Crucial Packages and Hidden Heroes

📅 2024-04-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Foundational open-source software libraries—such as NumPy and R base—that underpin biomedical research tools are frequently overlooked, with their contributions inadequately quantified and unrecognized in research policy. Method: We construct a cross-ecosystem (PyPI, CRAN, Bioconductor) software dependency graph, leveraging the CZ Software Mentions Dataset to identify empirically observed dependencies in real-world scientific workflows. We propose the first centrality assessment framework tailored to software ecosystems, integrating graph-theoretic metrics—including betweenness and closeness centrality—to systematically quantify the structural importance of foundational libraries. Contribution/Results: Our analysis uncovers several “hidden hero” libraries whose critical role is substantiated by reproducible, data-driven evidence. This work provides an empirical foundation for funding agencies to prioritize investment in essential software infrastructure, thereby strengthening the sustainability and reliability of the biomedical research ecosystem.

Technology Category

Application Category

📝 Abstract

Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.

Problem

Research questions and friction points this paper is trying to address.

Software Dependency Mapping

Biomedical Software Ecosystem

Hidden Foundation Libraries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Software Centrality Metrics

Biomedical Research Dependencies

Hidden Tools Significance

🔎 Similar Papers

Learning and teaching biological data science in the Bioconductor community