CRScore++: Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of optimizing unstructured text generation in code review comment synthesis using reinforcement learning (RL), where conventional RL rewards suffer from unreliability and sparsity. We propose a unified reward function that jointly incorporates verifiable tool signals—such as linter outputs and code smell detections—with subjective quality feedback from large language models (LLMs). Methodologically, we are the first to integrate structured tool signals and LLM feedback synergistically within Proximal Policy Optimization (PPO) training, enabling an end-to-end trainable, review-quality-driven framework supporting both supervised fine-tuning and teacher-student RL critique training. Key contributions include: (1) overcoming the fundamental unreliability of RL feedback for unstructured text generation; (2) achieving cross-language generalization; and (3) attaining multilingual state-of-the-art performance on the CRScore benchmark, significantly improving review quality for weaker student models and successfully transferring to unseen programming languages.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) to improve code review comment generation requires handling unstructured outputs, making reinforcement learning (RL) feedback challenging. The two main RL approaches, namely RL with Verifiable Feedback (RLVR) and RL with AI Feedback (RLAIF), offer trade-offs: RLVR provides reliable feedback for structured tasks like code generation, while RLAIF works for unstructured outputs but is subjective. We bridge this gap with CRScore++, an RL framework that leverages both LLM-based subjective feedback and verifiable signals for training. Extending CRScore, a code review evaluation metric integrating LLMs with verifiers like linters and code smell detectors, CRScore++ transforms these signals into training rewards. We show that CRScore++ improves a weaker student model through a combination of supervised fine-tuning and RL critique from a stronger teacher model, thus enabling generalization to novel programming languages.
Problem

Research questions and friction points this paper is trying to address.

Handling unstructured outputs in RL for code review comments
Bridging RLVR and RLAIF trade-offs for reliable feedback
Improving code review generalization to new programming languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLM-based and verifiable feedback for RL
Integrates linters and code smell detectors
Uses teacher-student model for generalization
🔎 Similar Papers
No similar papers found.
M
M. Kapadnis
Language Technologies Institute, Carnegie Mellon University
Atharva Naik
Atharva Naik
PhD Student, Carnegie Mellon University
LLM4CodeLLM ReasoningAlignment
C
Carolyn Rose
Language Technologies Institute, Carnegie Mellon University