Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In code reinforcement learning, the automatic synthesis of high-quality test cases remains a critical bottleneck for improving feedback accuracy. This paper proposes the Generator-Validation (G-V) framework: in the first stage, diverse candidate test cases—covering both common and edge-case scenarios—are generated via multi-strategy synthesis; in the second stage, correctness, safety, and discriminative power are rigorously ensured through gold-solution comparison, program output consistency checking, and a multi-layer security sandbox system specifically designed for online execution. Experiments demonstrate that the G-V framework significantly enhances training stability and final model performance, yielding test cases with high accuracy and strong generalization capability. All source code, datasets, and the sandbox system are publicly released.

Technology Category

Application Category

📝 Abstract
Precise, correct feedback is crucial for effectively training large language models (LLMs) in code reinforcement learning. However, synthesizing high-quality test cases remains a profoundly challenging and unsolved problem. In this work, we present Klear-CodeTest, a comprehensive test case synthesis framework featuring rigorous verification to ensure quality and reliability of test cases. Our approach achieves broad coverage of programming problems via a novel Generator-Validation (G-V) framework, ensuring correctness through a consistency validation mechanism that verifies outputs against gold solutions. The proposed G-V framework generates comprehensive test cases including both regular and corner cases, enhancing test coverage and discriminative power for solution correctness assessment in code reinforcement learning. In addition, we design a multi-layered security sandbox system optimized for online verification platforms, guaranteeing safe and reliable code execution. Through comprehensive experiments, we demonstrate the effectiveness of our curated dataset, showing significant improvements in model performance and training stability. The source codes, curated dataset and sandbox system are available at: https://github.com/Kwai-Klear/CodeTest.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing high-quality test cases for code reinforcement learning
Ensuring correctness and coverage of test cases via G-V framework
Providing safe and reliable code execution for online verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generator-Validation framework for test cases
Multi-layered security sandbox for safety
Consistency validation ensures test correctness