OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the prevailing assumption that reinforcement learning (RL) is indispensable for surpassing supervised fine-tuning (SFT) in programming reasoning tasks (e.g., CodeContests, LiveCodeBench), focusing instead on constructing high-quality SFT datasets to enhance small- and medium-scale models. Method: We propose an instruction-decoupled data filtering paradigm, systematically revealing for the first time the detrimental impact of execution-based filtering on code reasoning distillation. We establish a new principle—“instruction diversity outweighs solution correctness”—and further refine data quality via token-efficiency analysis and reasoning-path validation. Contribution/Results: We open-source both a high-quality SFT dataset and corresponding models. Empirical results show that SFT-only models achieve 61.8% on LiveCodeBench and 24.6% on CodeContests—substantially outperforming same-scale RL baselines. This demonstrates that carefully curated SFT data is pivotal for advancing programming reasoning capabilities without RL.

Technology Category

Application Category

📝 Abstract
Since the advent of reasoning-based large language models, many have found great success from distilling reasoning capabilities into student models. Such techniques have significantly bridged the gap between reasoning and standard LLMs on coding tasks. Despite this, much of the progress on distilling reasoning models remains locked behind proprietary datasets or lacks details on data curation, filtering and subsequent training. To address this, we construct a superior supervised fine-tuning (SFT) dataset that we use to achieve state-of-the-art coding capability results in models of various sizes. Our distilled models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning. We then perform analysis on the data sources used to construct our dataset, the impact of code execution filtering, and the importance of instruction/solution diversity. We observe that execution filtering negatively affected benchmark accuracy, leading us to prioritize instruction diversity over solution correctness. Finally, we also analyze the token efficiency and reasoning patterns utilized by these models. We will open-source these datasets and distilled models to the community.
Problem

Research questions and friction points this paper is trying to address.

Develop superior SFT dataset for coding tasks
Improve reasoning model distillation without proprietary data
Analyze data sources and filtering impact on accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised fine-tuning dataset for coding
Prioritize instruction diversity over correctness
Open-source datasets and distilled models
🔎 Similar Papers
No similar papers found.