🤖 AI Summary
Existing LLM-based automated test generation primarily produces static input-output assertion pairs, resulting in limited test diversity and insufficient debugging information. This work proposes a novel paradigm for generating executable test harnesses—supporting dynamic input construction and flexible output validation (e.g., invariant checking). Methodologically, we design a two-stage training framework: first, supervised fine-tuning (SFT) to teach LLMs the structural conventions of test scripts; second, reinforcement learning with a custom reward function (RLVR) to optimize test quality along dimensions such as correctness, coverage, and verifiability. Empirical evaluation demonstrates substantial improvements in defect detection rate and test strategy diversity; moreover, the generated harnesses support runtime extension to further enhance code generation fidelity. Our core contribution is the first systematic advancement of LLM-driven test generation—from static assertion pairs to fully executable, formally verifiable, and extensible test programs.
📝 Abstract
Existing LLM-based automatic test generation methods mainly produce input and expected output pairs to categorize the intended behavior of correct programs. Although straightforward, these methods have limited diversity in generated tests and cannot provide enough debugging information. We propose HarnessLLM, a two-stage training pipeline that enables LLMs to write harness code for testing. Particularly, LLMs generate code that synthesizes inputs and validates the observed outputs, allowing complex test cases and flexible output validation such as invariant checking. To achieve this, we train LLMs with SFT followed by RLVR with a customized reward design. Experiments show that HarnessLLM outperforms input-output-based testing in bug finding and testing strategy diversity. HarnessLLM further benefits the code generation performance through test-time scaling with our generated test cases as inference-phase validation. Our code is available at https://github.com/UCSB-NLP-Chang/HarnessLLM.git.