CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability and unpredictable performance often encountered in post-training large reasoning models (LRMs), particularly on challenging tasks or with weak base models. To overcome these limitations, the authors propose CoScale-RL, a co-scaling strategy that enhances problem solvability through multi-solution sampling, improves training stability by extending reinforcement learning rollouts, and integrates a redistillation-based model fusion technique to maintain computational efficiency. Notably, the approach operates without reliance on large-scale supervised fine-tuning data, substantially expanding the reasoning capabilities of LRMs. Empirical evaluations across four benchmarks demonstrate an average accuracy improvement of 3.76×, highlighting significant gains in both data and compute utilization efficiency.

Technology Category

Application Category

📝 Abstract
Training Large Reasoning Model (LRM) is usually unstable and unpredictable, especially on hard problems or weak foundation models. We found that the current post-training scaling strategy can still improve on these cases. We propose CoScale-RL, a novel scaling strategy with better data and computational efficiency. We first scale up solutions to make problems solvable. The core idea is to collect multiple solutions for each problem, rather than simply enlarging the dataset. Then, we scale up rollout computation to stabilize Reinforcement Learning. We further leverage a model merge technique called Re-distillation to sustain or even improve computational efficiency when scaling up. Our method significantly improves data and computational efficiency, with an average 3.76$\times$ accuracy improvement on four benchmarks. CoScale-RL is able to improve an LRM's ability boundary without an extensive SFT dataset. Our method provides a new scaling direction to further improve LRM's reasoning ability.
Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Model
post-training
training instability
reasoning ability
scaling strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

CoScale-RL
post-training scaling
reasoning model
re-distillation
computational efficiency
🔎 Similar Papers
No similar papers found.