System-2 Mathematical Reasoning via Enriched Instruction Tuning

📅 2024-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit weak multi-step logical reasoning in complex mathematical problem-solving, often relying on symbolic verification or external tools. Method: This paper proposes a two-stage reasoning enhancement paradigm—Reasoning Plan Generation (ERP) and Reasoning Step Completion (ERS)—that leverages human initial answers as meta-knowledge to guide the generation of high-accuracy, traceable reasoning chains. Integrated with Enhanced Instruction Tuning (EIT), the approach enables efficient supervised fine-tuning without external tools or formal verification. Contribution/Results: The method overcomes limitations of pure prompt engineering and tool dependency, achieving 84.1% accuracy on GSM8K and 32.5% on MATH—surpassing state-of-the-art fine-tuning and chain-of-thought methods, and matching the performance of tool-augmented models. It represents the first end-to-end learnable modeling of System 2–style multi-step reasoning.

Technology Category

Application Category

📝 Abstract
Solving complex mathematical problems via system-2 reasoning is a natural human skill, yet it remains a significant challenge for current large language models (LLMs). We identify the scarcity of deliberate multi-step reasoning data as a primary limiting factor. To this end, we introduce Enriched Instruction Tuning (EIT), a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback to create fine-grained reasoning trajectories. These datasets are then used to fine-tune open-source LLMs, enhancing their mathematical reasoning abilities without reliance on any symbolic verification program. Concretely, EIT is composed of two critical steps: Enriching with Reasoning Plan (ERP) and Enriching with Reasoning Step (ERS). The former generates a high-level plan that breaks down complex instructions into a sequence of simpler objectives, while ERS fills in reasoning contexts often overlooked by human annotators, creating a smoother reasoning trajectory for LLM fine-tuning. Unlike existing CoT prompting methods that generate reasoning chains only depending on LLM's internal knowledge, our method leverages human-annotated initial answers as ``meta-knowledge'' to help LLMs generate more detailed and precise reasoning processes, leading to a more trustworthy LLM expert for complex mathematical problems. In experiments, EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods, and even matching the performance of tool-augmented methods.
Problem

Research questions and friction points this paper is trying to address.

Complex Mathematical Problems
Language Models
Performance Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced Instruction Tuning
Mathematical Problem Solving
Language Models
🔎 Similar Papers
No similar papers found.
H
Huanqia Cai
Tencent
Y
Yijun Yang
Tencent, University of Technology Sydney
Zhifeng Li
Zhifeng Li
Tencent
computer visionpattern recognitionwith a recent focus on AIGC