ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
Existing reinforcement learning (RL)-based code verification methods suffer from a lack of real-world environmental feedback and are not optimized for verification tasks, resulting in unreliable self-verification. This paper proposes a multi-round generation–verification co-evolutionary RL framework: a large language model autonomously generates code, constructs test cases, and invokes external tools to obtain precise execution feedback in each round, then dynamically refines both code generation and verification capabilities via a customized dense reward signal. Key contributions include: (1) the first RL training paradigm enabling bidirectional co-evolution between code generation and verification; (2) a fine-grained, round-level reward design; and (3) support for deep reasoning-time scaling and continuous code evolution. Experiments on LiveCodeBench demonstrate significant improvements in Pass@k, with code quality monotonically increasing as inference rounds grow—outperforming DeepSeek-R1-Zero-Qwen-32B.

Technology Category

Application Category

📝 Abstract
Recent advances in reinforcement learning (RL) with verifiable outcome rewards have significantly improved the reasoning capabilities of large language models (LLMs), especially when combined with multi-turn tool interactions. However, existing methods lack both meaningful verification signals from realistic environments and explicit optimization for verification, leading to unreliable self-verification. To address these limitations, we propose ReVeal, a multi-turn reinforcement learning framework that interleaves code generation with explicit self-verification and tool-based evaluation. ReVeal enables LLMs to autonomously generate test cases, invoke external tools for precise feedback, and improves performance via a customized RL algorithm with dense, per-turn rewards. As a result, ReVeal fosters the co-evolution of a model's generation and verification capabilities through RL training, expanding the reasoning boundaries of the base model, demonstrated by significant gains in Pass@k on LiveCodeBench. It also enables test-time scaling into deeper inference regimes, with code consistently evolving as the number of turns increases during inference, ultimately surpassing DeepSeek-R1-Zero-Qwen-32B. These findings highlight the promise of ReVeal as a scalable and effective paradigm for building more robust and autonomous AI agents.
Problem

Research questions and friction points this paper is trying to address.

Improves unreliable self-verification in LLM code generation
Enhances generation and verification via iterative RL framework
Enables autonomous test case creation and tool-based feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-turn RL with self-verification and tool feedback
Autonomous test case generation and external tool invocation
Custom RL algorithm with dense per-turn rewards