Testing with AI Agents: An Empirical Study of Test Generation Frequency, Quality, and Coverage

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study addresses the lack of systematic understanding regarding the frequency, quality, and coverage effectiveness of AI-generated tests in real-world software development. Leveraging the AIDev dataset, we conduct an empirical analysis of 2,232 test-related commits, integrating data mining, code structure parsing, and coverage evaluation to provide the first large-scale characterization of AI-generated tests in practice. Our findings reveal that AI contributes 16.4% of all test commits. AI-generated tests are notably longer, exhibit higher assertion density, and follow more linear control flow compared to human-written tests. Crucially, their code coverage performance is on par with manually authored tests and significantly enhances overall project coverage across multiple repositories.

Technology Category

Application Category

📝 Abstract

Agent-based coding tools have transformed software development practices. Unlike prompt-based approaches that require developers to manually integrate generated code, these agent-based tools autonomously interact with repositories to create, modify, and execute code, including test generation. While many developers have adopted agent-based coding tools, little is known about how these tools generate tests in real-world development scenarios or how AI-generated tests compare to human-written ones. This study presents an empirical analysis of test generation by agent-based coding tools using the AIDev dataset. We extracted 2,232 commits containing test-related changes and investigated three aspects: the frequency of test additions, the structural characteristics of the generated tests, and their impact on code coverage. Our findings reveal that (i) AI authored 16.4% of all commits adding tests in real-world repositories, (ii) AI-generated test methods exhibit distinct structural patterns, featuring longer code and a higher density of assertions while maintaining lower cyclomatic complexity through linear logic, and (iii) AI-generated tests contribute to code coverage comparable to human-written tests, frequently achieving positive coverage gains across several projects.

Problem

Research questions and friction points this paper is trying to address.

AI agents

test generation

code coverage

empirical study

software testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents

test generation

code coverage