🤖 AI Summary
Scientific software testing faces unique challenges—including difficult test case design, ambiguous oracle determination, absence of quality assessment standards, and poor applicability of industrial testing tools. Method: We conducted the first large-scale empirical study, combining structured surveys with qualitative analysis and statistical testing across 217 scientific software developers to examine variations in testing practices, tool adoption, and demographic factors. Contribution/Results: We identify three core bottlenecks: test design, result validation, and quality measurement; further reveal widespread lack of awareness of and access to domain-specific testing tools. This work establishes the first empirical evidence of paradigmatic divergence between scientific and conventional software testing, advocating for lightweight, extensible, and computationally aware testing frameworks tailored to scientific computing. Our findings provide foundational evidence and strategic direction for advancing domain-specific testing methodology.
📝 Abstract
Context: Research software is essential for developing advanced tools and models to solve complex research problems and drive innovation across domains. Therefore, it is essential to ensure its correctness. Software testing plays a vital role in this task. However, testing research software is challenging due to the software's complexity and to the unique culture of the research software community. Aims: Building on previous research, this study provides an in-depth investigation of testing practices in research software, focusing on test case design, challenges with expected outputs, use of quality metrics, execution methods, tools, and desired tool features. Additionally, we explore whether demographic factors influence testing processes. Method: We survey research software developers to understand how they design test cases, handle output challenges, use metrics, execute tests, and select tools. Results: Research software testing varies widely. The primary challenges are test case design, evaluating test quality, and evaluating the correctness of test outputs. Overall, research software developers are not familiar with existing testing tools and have a need for new tools to support their specific needs. Conclusion: Allocating human resources to testing and providing developers with knowledge about effective testing techniques are important steps toward improving the testing process of research software. While many industrial testing tools exist, they are inadequate for testing research software due to its complexity, specialized algorithms, continuous updates, and need for flexible, custom testing approaches. Access to a standard set of testing tools that address these special characteristics will increase level of testing in research software development and reduce the overhead of distributing knowledge about software testing.