Change And Cover: Last-Mile, Pull Request-Based Regression Test Augmentation

📅 2026-01-16
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the “last-mile” regression testing gap in pull requests (PRs)—specifically, the lack of test coverage for modified code lines—by proposing a context-aware, large language model (LLM)-driven approach that automatically generates supplementary tests consistent with existing test styles. By extracting PR context, including relevant test functions, fixtures, and data generators, the method focuses on patch-level coverage to enable seamless integration and summarization of newly generated tests. Evaluation on 145 PRs demonstrates that 30% achieve full patch coverage, with a generation cost of only $0.11 per PR. Human evaluators rated the test quality at 4.53/5.0; of the 12 generated tests submitted upstream, 8 were merged, and two previously unknown bugs were successfully identified and fixed.

Technology Category

Application Category

📝 Abstract
Software is in constant evolution, with developers frequently submitting pull requests (PRs) to introduce new features or fix bugs. Testing PRs is critical to maintaining software quality. Yet, even in projects with extensive test suites, some PR-modified lines remain untested, leaving a"last-mile"regression test gap. Existing test generators typically aim to improve overall coverage, but do not specifically target the uncovered lines in PRs. We present Change And Cover (ChaCo), an LLM-based test augmentation technique that addresses this gap. It makes three contributions: (i) ChaCo considers the PR-specific patch coverage, offering developers augmented tests for code just when it is on the developers'mind. (ii) We identify providing suitable test context as a crucial challenge for an LLM to generate useful tests, and present two techniques to extract relevant test content, such as existing test functions, fixtures, and data generators. (iii) To make augmented tests acceptable for developers, ChaCo carefully integrates them into the existing test suite, e.g., by matching the test's structure and style with the existing tests, and generates a summary of the test addition for developer review. We evaluate ChaCo on 145 PRs from three popular and complex open-source projects - SciPy, Qiskit, and Pandas. The approach successfully helps 30% of PRs achieve full patch coverage, at the cost of $0.11, showing its effectiveness and practicality. Human reviewers find the tests to be worth adding (4.53/5.0), well integrated (4.2/5.0), and relevant to the PR (4.7/5.0). Ablations show test context is crucial for context-aware test generation, leading to 2x coverage. We submitted 12 tests, of which 8 have already been merged, and two previously unknown bugs were exposed and fixed. We envision our approach to be integrated into CI workflows, automating the last mile of regression test augmentation.
Problem

Research questions and friction points this paper is trying to address.

pull request
regression testing
test coverage
last-mile gap
software evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based test generation
pull request testing
test augmentation
patch coverage
test context extraction