🤖 AI Summary
In LLM-driven software testing, regression error defense lags behind code changes, leading to delayed defect detection. Method: We propose a binary paradigm—“hardening tests” (to enhance robustness of existing functionality) and “capturing tests” (to proactively identify newly introduced defects)—and formally define both, along with their dynamic interconversion mechanism. This motivates the novel challenge of Just-in-Time Capturing Tests (JiTTest): generating defensive and diagnostic test cases *before* code submission, triggered by change-aware signals, formalized property modeling, and LLM-based synthesis. Our approach integrates formal specifications, change-impact analysis, and automated effectiveness evaluation. Contribution/Results: Evaluated in Meta’s industrial practice, JiTTest significantly improves early defect capture and test proactivity. It demonstrates measurable gains in regression fault detection latency and reveals several critical open research questions in LLM-augmented, just-in-time testing.
📝 Abstract
Despite decades of research and practice in automated software testing, several fundamental concepts remain ill-defined and under-explored, yet offer enormous potential real-world impact. We show that these concepts raise exciting new challenges in the context of Large Language Models for software test generation. More specifically, we formally define and investigate the properties of hardening and catching tests. A hardening test is one that seeks to protect against future regressions, while a catching test is one that catches such a regression or a fault in new functionality introduced by a code change. Hardening tests can be generated at any time and may become catching tests when a future regression is caught. We also define and motivate the Catching `Just-in-Time' (JiTTest) Challenge, in which tests are generated `just-in-time' to catch new faults before they land into production. We show that any solution to Catching JiTTest generation can also be repurposed to catch latent faults in legacy code. We enumerate possible outcomes for hardening and catching tests and JiTTests, and discuss open research problems, deployment options, and initial results from our work on automated LLM-based hardening at Meta. This paperfootnote{Author order is alphabetical. The corresponding author is Mark Harman.} was written to accompany the keynote by the authors at the ACM International Conference on the Foundations of Software Engineering (FSE) 2025.