Hamster: A Large-Scale Study and Characterization of Developer-Written Tests

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the gap between real-world developer testing practices and automated test generation (ATG) techniques. Method: We conduct a large-scale empirical analysis of 1.7 million manually written Java test cases from open-source projects, characterizing their key features across five dimensions—test scope, fixture design, assertion patterns, input types, and mocking usage—and benchmarking them against two state-of-the-art ATG tools, EvoSuite and Randoop. Contribution/Results: Our analysis reveals, for the first time at scale, that the majority of human-written tests incorporate complex contextual dependencies, domain-specific semantic assertions, and fine-grained mocking logic—capabilities largely absent in current ATG tools. The study establishes the first large-scale empirical benchmark for aligning ATG with industrial practice and provides concrete, actionable directions for improving ATG’s fidelity to real-world testing needs.

Technology Category

Application Category

📝 Abstract
Automated test generation (ATG), which aims to reduce the cost of manual test suite development, has been investigated for decades and has produced countless techniques based on a variety of approaches: symbolic analysis, search-based, random and adaptive-random, learning-based, and, most recently, large-language-model-based approaches. However, despite this large body of research, there is still a gap in our understanding of the characteristics of developer-written tests and, consequently, in our assessment of how well ATG techniques and tools can generate realistic and representative tests. To bridge this gap, we conducted an extensive empirical study of developer-written tests for Java applications, covering 1.7 million test cases from open-source repositories. Our study is the first of its kind in studying aspects of developer-written tests that are mostly neglected in the existing literature, such as test scope, test fixtures and assertions, types of inputs, and use of mocking. Based on the characterization, we then compare existing tests with those generated by two state-of-the-art ATG tools. Our results highlight that a vast majority of developer-written tests exhibit characteristics that are beyond the capabilities of current ATG tools. Finally, based on the insights gained from the study, we identify promising research directions that can help bridge the gap between current tool capabilities and more effective tool support for developer testing practices. We hope that this work can set the stage for new advances in the field and bring ATG tools closer to generating the types of tests developers write.
Problem

Research questions and friction points this paper is trying to address.

Characterizing developer-written test cases' scope and structure
Assessing automated test generation tools' capability gaps
Identifying research directions for improving test generation realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conducted large-scale empirical study of developer tests
Analyzed test scope fixtures assertions and mocking characteristics
Compared developer tests with automated tool generated tests
🔎 Similar Papers
No similar papers found.
Rangeet Pan
Rangeet Pan
Staff Research Scientist, IBM Research, Yorktown Heights
Software EngineeringProgramming LanguageLarge Language Models
T
Tyler Stennett
Georgia Institute of Technology, Atlanta, GA, USA
R
Raju Pavuluri
IBM Research, Yorktown Heights, NY, USA
N
Nate Levin
Georgia Institute of Technology, Atlanta, GA, USA
A
Alessandro Orso
University of Georgia, Athens, GA, USA
S
Saurabh Sinha
IBM Research, Yorktown Heights, NY, USA