Use Property-Based Testing to Bridge LLM Code Generation and Validation

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to guarantee functional correctness in code generation; conventional test-driven development (TDD) is hampered by low-quality test cases and risks “self-deception”—where erroneous code passes flawed tests. This paper proposes the Property-Generated Solver (PGS) framework, the first to integrate property-based testing (PBT) into LLM code generation verification. PGS replaces brittle, example-based test oracles with high-level semantic program properties, enabling robust, semantics-aware feedback. It establishes a closed-loop co-optimization mechanism between generation and testing agents, breaking the vicious cycle wherein buggy code and defective tests reinforce each other. By unifying PBT, TDD, and semantic feedback generation, PGS achieves substantial improvements: +23.1%–37.3% absolute gain in pass@1 across multiple benchmarks, significantly outperforming state-of-the-art TDD-based approaches.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) excel at code generation, but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional Test-Driven Development (TDD) offers a path for code refinement, its efficacy with LLMs is often undermined by the scarcity of high-quality test cases or the pitfalls of automated test generation, including biased tests or inaccurate output predictions that can misdirect the correction process. This paper introduces Property-Generated Solver, a novel framework that leverages Property-Based Testing (PBT) to validate high-level program properties or invariants, instead of relying on specific input-output examples. These properties are often simpler to define and verify than directly predicting exhaustive test oracles, breaking the "cycle of self-deception" where tests might share flaws with the code they are meant to validate. Property-Generated Solver employs two collaborative LLM-based agents: a Generator dedicated to code generation and iterative refinement, and a Tester that manages the PBT life-cycle and formulate semantically rich feedback from property violations. The resulting comprehensive and actionable feedback then guides the Generator in its refinement efforts. By establishing PBT as the core validation engine within this iterative, closed-loop paradigm, Property-Generated Solver provides a robust mechanism for steering LLMs towards more correct and generalizable code. Extensive experimental results on multiple code generation benchmarks demonstrate that Property-Generated Solver achieves substantial pass@1 improvements, ranging from 23.1% to 37.3% relative gains over established TDD methods.
Problem

Research questions and friction points this paper is trying to address.

Ensuring functional correctness of LLM-generated code in complex tasks
Overcoming limitations of traditional Test-Driven Development with LLMs
Validating program properties instead of specific input-output examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Property-Based Testing for LLM code validation
Employs two collaborative LLM agents for refinement
Improves code correctness via iterative feedback loop
🔎 Similar Papers
No similar papers found.
Lehan He
Lehan He
Nanjing university of posts and telecommunications
computer scienceArtificial intelligence
Jing Shao
Jing Shao
Research Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University
Computer VisionMulti-Modal Large Language Model
Z
Zeren Chen
School of Software, Beihang University, Shanghai AI Laboratory, Beijing, China
X
Xiang Gao
School of Software, Beihang University, Beijing, China
Z
Zhe Zhang
School of Software, Beihang University, Beijing, China
Lu Sheng
Lu Sheng
School of Software, Beihang University
Embodied AI3D VisionMachine Learning