RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of current large language models in competitive programming, which typically rely on single-pass generation and thus fail to leverage their iterative refinement potential. The authors propose a self-refinement approach that integrates a Skeptical-Agent mechanism with a lightweight reinforcement learning framework. The Skeptical Agent critically filters generated code through local execution-based verification, while the reinforcement learning component requires only standard RLVR data for training. By enabling multiple refinement iterations beyond the single-attempt paradigm, the method substantially enhances model performance: a fine-tuned 4B-parameter model surpasses the single-pass results of a 32B model and approaches those of a 235B model.
📝 Abstract
While large language models (LLMs) have demonstrated strong performance on complex reasoning tasks such as competitive programming (CP), existing methods predominantly focus on single-attempt settings, overlooking their capacity for iterative refinement. In this paper, we present RefineRL, a novel approach designed to unleash the self-refinement capabilities of LLMs for CP problem solving. RefineRL introduces two key innovations: (1) Skeptical-Agent, an iterative self-refinement agent equipped with local execution tools to validate generated solutions against public test cases of CP problems. This agent always maintains a skeptical attitude towards its own outputs and thereby enforces rigorous self-refinement even when validation suggests correctness. (2) A reinforcement learning (RL) solution to incentivize LLMs to self-refine with only standard RLVR data (i.e., problems paired with their verifiable answers). Extensive experiments on Qwen3-4B and Qwen3-4B-2507 demonstrate that our method yields substantial gains: after our RL training, these compact 4B models integrated with the Skeptical-Agent not only outperform much larger 32B models but also approach the single-attempt performance of 235B models. These findings suggest that self-refinement holds considerable promise for scaling LLM reasoning, with significant potential for further advancement.
Problem

Research questions and friction points this paper is trying to address.

competitive programming
self-refinement
large language models
iterative refinement
reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Refinement
Reinforcement Learning
Competitive Programming
Skeptical-Agent
LLM Reasoning
🔎 Similar Papers
No similar papers found.