CCrepairBench: A High-Fidelity Benchmark and Reinforcement Learning Framework for C++ Compilation Repair

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual bottlenecks of data scarcity and insufficient semantic correctness in automated C++ compilation error repair, this paper introduces the first large-scale, high-fidelity repair dataset and proposes a reinforcement learning (RL) framework guided by hybrid reward signals. We innovatively design an LLM-as-a-Judge two-stage evaluation mechanism that jointly verifies syntactic validity and semantic correctness to ensure patch quality. Furthermore, we integrate large language models (LLMs), RL, and human alignment via a generate-verify pipeline, enabling a closed loop for dataset construction and model training. Applying RL fine-tuning to Qwen2.5-1.5B-Instruct, our approach achieves repair performance comparable to that of 14B-class models—demonstrating substantial improvements in practical utility and scalability of small models within real-world development scenarios.

Technology Category

Application Category

📝 Abstract
The automated repair of C++ compilation errors presents a significant challenge, the resolution of which is critical for developer productivity. Progress in this domain is constrained by two primary factors: the scarcity of large-scale, high-fidelity datasets and the limitations of conventional supervised methods, which often fail to generate semantically correct patches.This paper addresses these gaps by introducing a comprehensive framework with three core contributions. First, we present CCrepair, a novel, large-scale C++ compilation error dataset constructed through a sophisticated generate-and-verify pipeline. Second, we propose a Reinforcement Learning (RL) paradigm guided by a hybrid reward signal, shifting the focus from mere compilability to the semantic quality of the fix. Finally, we establish the robust, two-stage evaluation system providing this signal, centered on an LLM-as-a-Judge whose reliability has been rigorously validated against the collective judgments of a panel of human experts. This integrated approach aligns the training objective with generating high-quality, non-trivial patches that are both syntactically and semantically correct. The effectiveness of our approach was demonstrated experimentally. Our RL-trained Qwen2.5-1.5B-Instruct model achieved performance comparable to a Qwen2.5-14B-Instruct model, validating the efficiency of our training paradigm. Our work provides the research community with a valuable new dataset and a more effective paradigm for training and evaluating robust compilation repair models, paving the way for more practical and reliable automated programming assistants.
Problem

Research questions and friction points this paper is trying to address.

Automated repair of C++ compilation errors
Addresses scarcity of high-fidelity datasets
Improves semantic correctness of generated patches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale C++ dataset via generate-and-verify pipeline
Reinforcement Learning with hybrid semantic reward signals
LLM-as-a-Judge evaluation system validated by experts
🔎 Similar Papers
No similar papers found.
Weixuan Sun
Weixuan Sun
Tencent | PhD ANU
computer visionmachine learningnatural Language Processing
J
Jucai Zhai
AIM, ZTE Corporation, Changsha, China
D
Dengfeng Liu
AIM, ZTE Corporation, Changsha, China
X
Xin Zhang
Laboratory of Unmanned Combat Systems, National University of Defense Technology, Changsha, China
X
Xiaojun Wu
AIM, ZTE Corporation, Changsha, China
Q
Qiaobo Hao
AIM, ZTE Corporation, Changsha, China
A
AIMgroup
AIM, ZTE Corporation, Changsha, China
Y
Yang Fang
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China
J
Jiuyang Tang
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, China