CP-Agent: A Calibrated Risk-Controlled Agent for Feedback-Driven Competitive Programming

📅 2026-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models exhibit limited performance on competition-level programming tasks, and existing approaches often rely on extensive sampling or costly fine-tuning. This work proposes a feedback-driven solving framework that requires no parameter updates, modeling the programming process as a calibrated stopping-time procedure. By introducing a structured certificate mechanism, it establishes—for the first time—a formal connection between risk control and a verifiable lower bound on success probability. The method integrates dual-granularity verification, test augmentation, and experience-driven self-evolution, augmented with a pre-declared finite controller and trajectory calibration. Evaluated on LiveCodeBench Pro, it improves Pass@1 from 25.8% to 48.5%, and achieves an 11.0% gain in Refine@5 on ICPC-Eval, demonstrating state-of-the-art cost-accuracy efficiency across multiple mainstream large language models.
📝 Abstract
Large language models still struggle with contest-level programming, while many agentic remedies rely on massive inference-time sampling or expensive multi-stage post-training. We study when execution feedback reliably helps an LLM CP solver and which mechanisms govern the gains. We model feedback-driven solving as a calibrated stopped process and identify three quantities: false-admission risk, program-level evidence against bad programs, and the active-state success hazard. Under held-out trace calibration and selection from a pre-declared finite controller manifest, the resulting structural certificate lower-bounds the clean success probability before false admission. We instantiate mechanisms targeting these quantities as Dual-Granularity Verification, Test Augmentation, and Experience-Driven Self-Evolving, yielding CP-Agent. Without updating any parameters, CP-Agent raises Pass@1 from 25.8\% to 48.5\% on LiveCodeBench Pro and improves Refine@5 by 11.0\% on ICPC-Eval. Across three LLM backbones, CP-Agent lies on the cost--accuracy efficiency frontier, and ablations show that each component primarily affects its corresponding certificate quantity.
Problem

Research questions and friction points this paper is trying to address.

competitive programming
large language models
execution feedback
risk control
program verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

calibrated stopping process
risk-controlled agent
feedback-driven programming
dual-granularity verification
experience-driven self-evolving
🔎 Similar Papers