Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently err in mathematical reasoning, and conventional error-correction methods operate on isolated incorrect instances without extracting generalizable error patterns. Method: We propose “error generalization”—a novel paradigm that (1) extracts salient erroneous phrases, clusters them semantically via Sentence-BERT and K-means to identify recurrent error types; (2) employs GPT-4o to perform self-instructed data synthesis and single-round refinement on representative mispredictions, yielding structured, transferable correction examples; and (3) iteratively fine-tunes the target model using these samples. Contribution/Results: This is the first work to abstract generalizable error patterns from failures and leverage them for high-quality, pattern-aware data construction. Evaluated on GSM8K and MATH, our method significantly improves accuracy across multiple LLMs and demonstrates strong out-of-distribution generalization, validating the efficacy and universality of error-driven data curation.

Technology Category

Application Category

📝 Abstract
Although large language models demonstrate strong performance across various domains, they still struggle with numerous bad cases in mathematical reasoning. Previous approaches to learning from errors synthesize training data by solely extrapolating from isolated bad cases, thereby failing to generalize the extensive patterns inherent within these cases. This paper presents Self-Error-Instruct (SEI), a framework that addresses these model weaknesses and synthesizes more generalized targeted training data. Specifically, we explore a target model on two mathematical datasets, GSM8K and MATH, to pinpoint bad cases. Then, we generate error keyphrases for these cases based on the instructor model's (GPT-4o) analysis and identify error types by clustering these keyphrases. Next, we sample a few bad cases during each generation for each identified error type and input them into the instructor model, which synthesizes additional training data using a self-instruct approach. This new data is refined through a one-shot learning process to ensure that only the most effective examples are kept. Finally, we use these curated data to fine-tune the target model, iteratively repeating the process to enhance performance. We apply our framework to various models and observe improvements in their reasoning abilities across both in-domain and out-of-domain mathematics datasets. These results demonstrate the effectiveness of self-error instruction in improving LLMs' mathematical reasoning through error generalization.
Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' struggles with mathematical reasoning errors
Generalizes error patterns from bad cases for better training
Improves models' reasoning via iterative error-based data refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates error keyphrases via GPT-4o analysis
Clusters keyphrases to identify error types
Refines data via one-shot learning iteration
🔎 Similar Papers
No similar papers found.
Erxin Yu
Erxin Yu
The Hong Kong Polytechnic University
J
Jing Li
Department of Computing, The Hong Kong Polytechnic University; Research Centre for Data Science & Artificial Intelligence
M
Ming Liao
Department of Computing, The Hong Kong Polytechnic University
Q
Qi Zhu
Huawei Noah’s Ark Lab
Boyang Xue
Boyang Xue
Ph.D. Candidate in The Chinese University of Hong Kong
Natural Language ProcessingLarge Language ModelsSpeech Recognition
M
Minghui Xu
Huawei Noah’s Ark Lab
Baojun Wang
Baojun Wang
Huawei Noah’s Ark Lab
NLP
L
Lanqing Hong
Huawei Noah’s Ark Lab
Fei Mi
Fei Mi
Huawei Noah's Ark Lab
LLM Post Training
Lifeng Shang
Lifeng Shang
Huawei Noah's Ark Lab
Machine LearningComputer VisionPattern ReconitionNatural Language Processing