How Students Use Generative AI for Software Testing: An Observational Study

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how novice developers interact with generative AI (e.g., ChatGPT) in unit testing—focusing on collaboration patterns, dependency levels, and impacts on learning outcomes, code quality, and developer identity. Method: A mixed-methods approach integrates naturalistic observation, prompt engineering experiments, and multidimensional evaluation—including mutation score, test bad smells, cognitive load, and trust metrics. Contribution/Results: We identify four novel human–AI collaboration strategies. While prompt style significantly influences the generation process, it does not improve final test quality. Generative AI substantially reduces test-writing time and cognitive load; however, it raises critical concerns regarding output reliability, accountability, and potential long-term inhibition of skill acquisition. The findings provide empirical grounding and design implications for integrating AI into software engineering education.

Technology Category

Application Category

📝 Abstract
The integration of generative AI tools like ChatGPT into software engineering workflows opens up new opportunities to boost productivity in tasks such as unit test engineering. However, these AI-assisted workflows can also significantly alter the developer's role, raising concerns about control, output quality, and learning, particularly for novice developers. This study investigates how novice software developers with foundational knowledge in software testing interact with generative AI for engineering unit tests. Our goal is to examine the strategies they use, how heavily they rely on generative AI, and the benefits and challenges they perceive when using generative AI-assisted approaches for test engineering. We conducted an observational study involving 12 undergraduate students who worked with generative AI for unit testing tasks. We identified four interaction strategies, defined by whether the test idea or the test implementation originated from generative AI or the participant. Additionally, we singled out prompting styles that focused on one-shot or iterative test generation, which often aligned with the broader interaction strategy. Students reported benefits including time-saving, reduced cognitive load, and support for test ideation, but also noted drawbacks such as diminished trust, test quality concerns, and lack of ownership. While strategy and prompting styles influenced workflow dynamics, they did not significantly affect test effectiveness or test code quality as measured by mutation score or test smells.
Problem

Research questions and friction points this paper is trying to address.

Investigating novice developers' interaction with generative AI for unit testing
Examining reliance levels and perceived benefits of AI-assisted test engineering
Identifying interaction strategies and challenges in AI-generated test workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Observed student-AI interaction strategies for testing
Analyzed prompting styles for test generation
Evaluated AI impact on test effectiveness
🔎 Similar Papers
No similar papers found.
B
Baris Ardic
Computer Science, Delft University of Technology, Delft, 2628XE, State, Netherlands.
Q
Quentin Le Dilavrec
Computer Science, Delft University of Technology, Delft, 2628XE, State, Netherlands.
Andy Zaidman
Andy Zaidman
Professor of Software Engineering, Delft University of Technology
software engineeringsoftware evolutionsoftware testingempirical software engineeringmining software repositories