Improving Software Engineering in Biostatistics: Challenges and Opportunities

📅 2023-01-24
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
This study systematically identifies critical barriers in biostatistical software development: information silos across roles and departments lead to redundant implementation; low code readability and reusability hinder maintainability; and the absence of standardized version control and testing practices severely compromises result reproducibility and long-term sustainability. To address these challenges, we propose— for the first time—a lightweight, discipline-aware software engineering framework tailored to biostatistics. It integrates foundational practices—including Git-based version control, unit testing, and documentation standards—while embedding cross-functional collaboration mechanisms, domain-adapted code review protocols, and structured knowledge-sharing strategies. The resulting framework yields a scalable, field-tested capability-building guide that demonstrably improves the reliability and reproducibility of statistical modeling code, enhances team collaboration efficiency, and bridges the practice gap between statisticians and software engineering principles.
📝 Abstract
Programming is ubiquitous in applied biostatistics; adopting software engineering skills will help biostatisticians do a better job. To explain this, we start by highlighting key challenges for software development and application in biostatistics. Silos between different statistician roles, projects, departments
Problem

Research questions and friction points this paper is trying to address.

Addressing duplicate code from siloed statistician roles and projects
Ensuring reliable software through readable code and testing frameworks
Improving reproducibility by overcoming manual workflows and uncontrolled development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adopting software engineering skills in biostatistics
Establishing dedicated software engineering teams
Using tools to improve reproducibility of analyses
🔎 Similar Papers
No similar papers found.
D
Daniel Sabanés Bové
RCONIS / inferential.biostatistics GmbH, Friedrichstrasse 12, 4055 Basel, Switzerland
H
Heidi Seibold
Digital Research Academy, Bayerstr. 77c, 80335 Munich, Germany
Anne-Laure Boulesteix
Anne-Laure Boulesteix
Ludwig-Maximilians-Universität München
biostatisticscomputational statisticsmetascience
J
Juliane Manitz
R Validation Hub
A
Alessandro Gasparini
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 17177 Stockholm, Sweden
B
Burak K. Günhan
Merck Healthcare KGaA, Frankfurter Strasse 250, 64293 Darmstadt, Germany
O
Oliver Boix
Bayer AG, Aprather Weg 18a, 42113 Wuppertal, Germany
A
Armin Schüler
MorphoSys GmbH, Semmelweisstr. 7, 82152 Planegg, Germany
S
Sven Fillinger
Quantitative Biology Center (QBiC), University of Tübingen, 72076 Tübingen, Germany
S
Sven Nahnsen
Quantitative Biology Center (QBiC), University of Tübingen, 72076 Tübingen, Germany
A
Anna E. Jacob
Institute for Medical Information Processing, Biometry and Epidemiology, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
Thomas Jaki
Thomas Jaki
Professor of Statistics, University of Regensburg and University of Cambridge
Pre-clinical StatisticsEarly Phase Clinical TrialsAdaptive Designs