Competitive Programming with Large Reasoning Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Traditional competitive programming systems rely heavily on domain-specific heuristic engineering and problem-specific constraint relaxation, limiting generalization and scalability. Method: This work introduces o3, a purely scaled general-purpose reasoning model trained via end-to-end reinforcement learning (RL), test-time policy optimization, and large-scale code-reasoning pretraining—without handcrafted strategies or competition-specific modifications. Contribution/Results: o3 achieves gold-medal-level performance on IOI 2024, attains Codeforces ratings comparable to top human competitors, and substantially outperforms the competition-specialized system o1-ioi. Crucially, this is the first demonstration that a general-purpose large language model can attain elite competitive programming performance solely through RL-based fine-tuning, without domain engineering. The results establish “scale + RL” as a viable, generalizable paradigm for high-stakes algorithmic reasoning, challenging the necessity of manual heuristic design in programming competition AI systems.

Technology Category

Application Category

📝 Abstract

We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models - OpenAI o1 and an early checkpoint of o3 - with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model surpasses those results without relying on hand-crafted inference heuristics. Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, offers a robust path toward state-of-the-art AI in reasoning domains, such as competitive programming.

Problem

Research questions and friction points this paper is trying to address.

Enhancing complex coding tasks

Comparing general vs domain-specific models

Achieving competitive programming success

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances LLMs

o3 outperforms without custom strategies

Scaling general-purpose models exceeds domain-specific

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting