NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

A critical bottleneck hindering reasoning capabilities in non-mathematical and non-programming domains is the scarcity of diverse, high-quality open-domain reasoning problems. Method: We propose the first large-scale, automated framework for generating real-world reasoning tasks across multidisciplinary domains—including STEM, economics, and social sciences—yielding a high-quality dataset of 2.8 million reasoning problems. Our approach integrates strong teacher-model prompting, multi-stage quality filtering, external and self-reward mechanisms, structured answer alignment, consistency verification, and supports both unsupervised self-training and knowledge distillation. Contribution/Results: Empirical evaluation demonstrates that models fine-tuned on this dataset achieve significantly improved generalization across complex, cross-domain reasoning tasks. Distilled student models attain performance close to that of their teacher models, while self-training enables stable reasoning capability gains even in fully unsupervised, zero-label settings.

Technology Category

Application Category

📝 Abstract

Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding.

Problem

Research questions and friction points this paper is trying to address.

Generate diverse reasoning questions

Enhance reasoning across multiple domains

Enable unsupervised self-training capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse reasoning questions

Spans multiple academic domains

Utilizes knowledge distillation experiments

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting