ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

📅 2024-10-03
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face limited unsupervised improvement in reasoning due to poor out-of-distribution (OOD) generalization and reliance on manual annotations or supervision from strong models. Method: This paper proposes an “abstract → concrete” hierarchical self-synthesis paradigm that fully leverages the LLM’s intrinsic reasoning capabilities. It employs a three-stage framework—reasoning guide concretization, task structure generation, and path instantiation—to dynamically transform generic reasoning principles into high-quality, task-adapted reasoning traces, without human examples or external supervision. Contribution/Results: The core innovation lies in decoupling abstract guidance from concrete execution, substantially enhancing OOD generalization. Evaluated across six cross-domain reasoning tasks, our method achieves an average +6.1% improvement, while baselines degrade by −4.6% on average. Results are robust and reproducible across diverse LLMs and architectural variants.

Technology Category

Application Category

📝 Abstract
Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning without human supervision
Improving generalization in out-of-domain reasoning tasks
Self-synthesizing abstract-to-concrete reasoning paths
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-synthesizing reasoning paths without supervision
Progressing from abstract to concrete reasoning
Converting general guidelines into task-specific ones
🔎 Similar Papers
No similar papers found.
X
Xiangyu Peng
Salesforce AI Research
Congying Xia
Congying Xia
Meta GenAI
X
Xinyi Yang
Salesforce AI Research
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
C
Chien-Sheng Wu
Salesforce AI Research
C
Chen Xing
Salesforce AI Research