🤖 AI Summary
This work addresses the data scarcity bottleneck in training reasoning models, which stems from overreliance on high-difficulty problems. We propose a novel paradigm that prioritizes reasoning chain length—not problem difficulty—as the core optimization dimension. To this end, we introduce a controllable synthetic data generation method that produces reasoning traces with precisely specified lengths. Our first empirical finding reveals that reasoning length exerts a significantly stronger influence on model performance than problem difficulty, and we establish a log-linear scaling law between length and performance. Leveraging this insight, we design a length-decoupled training strategy: fine-tuning Qwen2.5-32B-Instruct on merely 1,000 length-controlled samples yields Long1K-32B, achieving 95.6% accuracy on MATH and 71.1% on GPQA—surpassing DeepSeek-R1-Distill-Qwen-32B. All code, datasets, and models are publicly released.
📝 Abstract
Difficult problems, which often result in long reasoning traces, are widely recognized as key factors for enhancing the performance of reasoning models. However, such high-challenge problems are scarce, limiting the size of available datasets. In this paper, we propose a simple method to decouple the reliance on problem difficulty. First, we empirically demonstrate that reasoning length, rather than problem difficulty, primarily influences the performance of trained models. Second, we identify a scaling law on reasoning length, showing that model performance increases in a log-linear fashion as the reasoning data length grows. Finally, we introduce a straightforward technique to generate reasoning data of arbitrary length, and show that synthesized data is effective for training reasoning models. After fine-tuning the Qwen2.5-32B-Instruct language model on our Long1K dataset, we present our model, Long1K-32B, which achieves remarkable performance with only 1,000 training samples, achieving 95.6% accuracy on MATH, and 71.1% on GPQA outperforming DeepSeek-R1-Distill-Qwen-32B. The model, code, and dataset are all open-sourced, available at https://huggingface.co/ZTss/LONG1.