Scaling Test-Driven Code Generation from Functions to Classes: An Empirical Study

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing test-driven code generation approaches are largely confined to the function level and struggle to handle the complexity of class-level code, where multiple methods interact through shared state and invocation dependencies. This work proposes an iterative test-driven framework that addresses this challenge by analyzing intra-class method dependencies to determine a synthesis order, then progressively generating complete class implementations through a combination of public test execution, reflective execution feedback, and bounded repair iterations. The study presents the first effective extension of test-driven program synthesis to the class level and introduces ClassEval-TDD, a standardized benchmark for evaluation. Experiments across eight large language models demonstrate that the proposed method improves class-level pass rates by 12–26 percentage points, achieving up to 71% full correctness, with only a few repair iterations required on average.

Technology Category

Application Category

📝 Abstract
Test-driven development (TDD) has been adopted to improve Large Language Model (LLM)-based code generation by using tests as executable specifications. However, existing TDD-style code generation studies are largely limited to function-level tasks, leaving class-level synthesis where multiple methods interact through shared state and call dependencies underexplored. In this paper, we scale test-driven code generation from functions to classes via an iterative TDD framework. Our approach first analyzes intra-class method dependencies to derive a feasible generation schedule, and then incrementally implements each method under method-level public tests with reflection-style execution feedback and bounded repair iterations. To support test-driven generation and rigorous class-level evaluation, we construct ClassEval-TDD, a cleaned and standardized variant of ClassEval with consistent specifications, deterministic test environments, and complete method-level public tests. We conduct an empirical study across eight LLMs and compare against the strongest direct-generation baseline (the best of holistic, incremental, and compositional strategies). Our class-level TDD framework consistently improves class-level correctness by 12 to 26 absolute points and achieves up to 71% fully correct classes, while requiring only a small number of repairs on average. These results demonstrate that test-driven generation can effectively scale beyond isolated functions and substantially improve class-level code generation reliability. All code and data are available at https://anonymous.4open.science/r/ClassEval-TDD-C4C9/
Problem

Research questions and friction points this paper is trying to address.

test-driven development
class-level code generation
large language models
method dependencies
executable specifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-driven development
class-level code generation
method dependency analysis
iterative repair
LLM-based code synthesis
🔎 Similar Papers
No similar papers found.
Y
Yunhao Liang
Chengdu Institute of Computer Applications, Chinese Academy of Sciences and University of Chinese Academy of Sciences, China
R
Ruixuan Ying
Institute of Multidisciplinary Research for Advanced Materials (IMRAM), Tohoku University, Japan
Zhe Cui
Zhe Cui
Beijing University of Posts and Telecommunications
fingerprint
S
Shiwen Ni
Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences and Artificial Intelligence Research Institute, Shenzhen University of Advanced Technology, China