A Framework for Creating Non-Regressive Test Cases via Branch Consistency Analysis Driven by Descriptions

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Existing test generation methods predominantly assume the correctness of the target method, rendering them ineffective at exposing faults in non-regression scenarios—i.e., when the subject code contains actual defects. To address this limitation, we propose DISTINCT, a description-driven branch-consistency analysis framework that leverages natural-language functional descriptions to guide large language models (LLMs) in generating compilable, semantically aligned, and defect-sensitive test cases. DISTINCT pioneers the integration of functional descriptions with branch-level behavioral consistency analysis, shifting the paradigm from coverage-oriented to defect-aware testing. We introduce the first functional-description-augmented defect benchmarks: Defects4J-Desc and QuixBugs-Desc. DISTINCT employs a three-stage iterative architecture combining LLM generation, compilation feedback, and semantic alignment. Experiments demonstrate that DISTINCT achieves average improvements of 14.64% in compilation success rate, 6.66% in test pass rate, and up to 149.26% in defect detection rate, along with 3.77% and 5.36% gains in statement and branch coverage, respectively.

Technology Category

Application Category

📝 Abstract

Automated test-generation research overwhelmingly assumes the correctness of focal methods, yet practitioners routinely face non-regression scenarios where the focal method may be defective. A baseline evaluation of EvoSuite and two leading Large Language Model (LLM)-based generators, namely ChatTester and ChatUniTest, on defective focal methods reveals that despite achieving up to 83% of branch coverage, none of the generated tests expose defects. To resolve this problem, we first construct two new benchmarks, namely Defects4J-Desc and QuixBugs-Desc, for experiments. In particular, each focal method is equipped with an extra Natural Language Description (NLD) for code functionality understanding. Subsequently, we propose DISTINCT, a Description-guided, branch-consistency analysis framework that transforms LLMs into fault-aware test generators. DISTINCT carries three iterative components: (1) a Generator that derives initial tests based on the NLDs and the focal method, (2) a Validator that iteratively fixes uncompilable tests using compiler diagnostics, and (3) an Analyzer that iteratively aligns test behavior with NLD semantics via branch-level analysis. Extensive experiments confirm the effectiveness of our approach. Compared to state-of-the-art methods, DISTINCT achieves an average improvement of 14.64% in Compilation Success Rate (CSR) and 6.66% in Passing Rate (PR) across both benchmarks. It notably enhances Defect Detection Rate (DDR) on both benchmarks, with a particularly significant gain of 149.26% observed on Defects4J-Desc. In terms of code coverage, DISTINCT improves Statement Coverage (SC) by an average of 3.77% and Branch Coverage (BC) by 5.36%. These results set a new baseline for non-regressive test generation and highlight how description-driven reasoning enables LLMs to move beyond coverage chasing toward effective defect detection.

Problem

Research questions and friction points this paper is trying to address.

Generating tests that detect defects in focal methods

Improving test compilation and passing rates using NLDs

Enhancing defect detection via branch-consistency analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Description-guided branch-consistency analysis framework

Iterative test generation with NLD semantics

Validator fixes uncompilable tests using diagnostics

🔎 Similar Papers

Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis