Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In LLM-driven software testing, regression error defense lags behind code changes, leading to delayed defect detection. Method: We propose a binary paradigm—“hardening tests” (to enhance robustness of existing functionality) and “capturing tests” (to proactively identify newly introduced defects)—and formally define both, along with their dynamic interconversion mechanism. This motivates the novel challenge of Just-in-Time Capturing Tests (JiTTest): generating defensive and diagnostic test cases *before* code submission, triggered by change-aware signals, formalized property modeling, and LLM-based synthesis. Our approach integrates formal specifications, change-impact analysis, and automated effectiveness evaluation. Contribution/Results: Evaluated in Meta’s industrial practice, JiTTest significantly improves early defect capture and test proactivity. It demonstrates measurable gains in regression fault detection latency and reveals several critical open research questions in LLM-augmented, just-in-time testing.

Technology Category

Application Category

📝 Abstract
Despite decades of research and practice in automated software testing, several fundamental concepts remain ill-defined and under-explored, yet offer enormous potential real-world impact. We show that these concepts raise exciting new challenges in the context of Large Language Models for software test generation. More specifically, we formally define and investigate the properties of hardening and catching tests. A hardening test is one that seeks to protect against future regressions, while a catching test is one that catches such a regression or a fault in new functionality introduced by a code change. Hardening tests can be generated at any time and may become catching tests when a future regression is caught. We also define and motivate the Catching `Just-in-Time' (JiTTest) Challenge, in which tests are generated `just-in-time' to catch new faults before they land into production. We show that any solution to Catching JiTTest generation can also be repurposed to catch latent faults in legacy code. We enumerate possible outcomes for hardening and catching tests and JiTTests, and discuss open research problems, deployment options, and initial results from our work on automated LLM-based hardening at Meta. This paperfootnote{Author order is alphabetical. The corresponding author is Mark Harman.} was written to accompany the keynote by the authors at the ACM International Conference on the Foundations of Software Engineering (FSE) 2025.
Problem

Research questions and friction points this paper is trying to address.

Defining hardening and catching tests for LLM-based software testing
Addressing the Catching Just-in-Time (JiTTest) Challenge for fault prevention
Exploring automated LLM-based test generation for legacy code faults
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardening tests protect against future regressions
Catching tests detect faults in new code changes
Just-in-Time tests catch faults before production
🔎 Similar Papers
No similar papers found.