Reverse-Engineered Reasoning for Open-Ended Generation

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Addressing the challenges of deep reasoning modeling in open-domain creative generation, the scarcity of reliable reward signals for reinforcement learning, and the high cost and capability limitations of knowledge distillation with teacher models, this paper proposes REER—an inverse engineering reasoning paradigm. REER directly infers interpretable, stepwise reasoning paths from high-quality generated outputs, eliminating the need for reward models or teacher models. By integrating large-scale sampling with systematic path mining, we construct DeepWriting-20K, the first high-quality, open-domain reasoning trajectory dataset comprising 20,000 instances. Leveraging this dataset, we train DeepWriter-8B, an 8-billion-parameter model. Experiments demonstrate that DeepWriter-8B significantly outperforms leading open-source models across multiple open-generation benchmarks, matching or exceeding the performance of GPT-4o and Claude 3.5. Crucially, REER enables gradient-free, scalable, and low-cost automatic deep reasoning generation—achieving this for the first time.

Technology Category

Application Category

📝 Abstract

While the ``deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoning -- reinforcement learning (RL) and instruction distillation -- falter in this area; RL struggles with the absence of clear reward signals and high-quality reward models, while distillation is prohibitively expensive and capped by the teacher model's capabilities. To overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a new paradigm that fundamentally shifts the approach. Instead of building a reasoning process ``forwards'' through trial-and-error or imitation, REER works ``backwards'' from known-good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them. Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5.

Problem

Research questions and friction points this paper is trying to address.

Addresses open-ended creative generation reasoning challenges

Overcomes limitations in reinforcement learning and instruction distillation

Discovers latent reasoning processes from known-good solutions backwards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reverse-engineered reasoning from known-good solutions

Scalable gradient-free latent process discovery

Large-scale dataset for open-ended reasoning tasks

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting