Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists

📅 2025-01-17

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Poor code readability in scientific software severely hinders cross-team collaboration and research reproducibility—particularly among self-taught researchers, who typically lack formal training in readability best practices, resulting in opaque naming conventions and inadequate documentation. This study employs a mixed-methods approach—including surveys, in-depth interviews, and statistical analysis—across 57 interdisciplinary researchers to empirically investigate current practices. It reveals, for the first time, that in the absence of structured training, researchers heavily rely on informal, ad hoc commenting practices; further, it identifies large language models (LLMs) as an emerging paradigm for enhancing code quality. Results show that 57.9% of participants received no readability-specific instruction, with inconsistent naming and missing documentation identified as the two primary bottlenecks. Based on these findings, we propose a lightweight, human-centered code quality support framework tailored for scientific programmers—addressing a critical gap in the human factors literature on scientific code readability.

Technology Category

Application Category

📝 Abstract

Scientific software-defined as computer programs, scripts, or code used in scientific research, data analysis, modeling, or simulation-has become central to modern research. However, there is limited research on the readability and understandability of scientific code, both of which are vital for effective collaboration and reproducibility in scientific research. This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and the challenges they face regarding code readability. Our findings reveal that most participants learn programming through self-study or on the-job training, with 57.9% lacking formal instruction in writing readable code. Scientists mainly use Python and R, relying on comments and documentation for readability. While most consider code readability essential for scientific reproducibility, they often face issues with inadequate documentation and poor naming conventions, with challenges including cryptic names and inconsistent conventions. Our findings also show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality. These findings offer practical insights into enhancing coding practices and supporting sustainable development in scientific software.

Problem

Research questions and friction points this paper is trying to address.

Scientific Software Readability

Code Documentation

Variable Naming

Innovation

Methods, ideas, or system contributions that make the work stand out.

Code Readability

Scientific Programming Education

Large Language Models in Code Quality

🔎 Similar Papers

Do Current Language Models Support Code Intelligence for R Programming Language?