WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

📅 2023-08-18
🏛️ arXiv.org
📈 Citations: 321
Influential: 58
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit weak chain-of-thought (CoT) reasoning in complex mathematics, heavily rely on external tools, and lack explicit supervision over reasoning processes. Method: We propose Reinforcement Learning-driven Evolutionary Instruction Feedback (RLEIF), the first framework jointly optimizing instruction evolution and stepwise reasoning supervision. Built upon open-source base models (e.g., Mistral, Gemma), RLEIF integrates evolutionary instruction tuning, process-supervised reinforcement learning, and distillation of high-quality mathematical CoT data to enable fully language-based, end-to-end mathematical problem solving. Contribution/Results: The resulting WizardMath-7B achieves state-of-the-art performance among open-source models of comparable size—92.3% on GSM8K and 52.4% on MATH. Its 70B variant surpasses GPT-3.5-Turbo, Claude 2, and early GPT-4, empirically validating the effectiveness and scalability of process-oriented reasoning optimization.
📝 Abstract
Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Furthermore, WizardMath 70B even outperforms GPT-3.5-Turbo, Claude 2, Gemini Pro and GPT-4-early-version. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance. For more details refer to https://github.com/nlpxucan/WizardLM
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Mathematical Understanding
Complex Problem Solving
Innovation

Methods, ideas, or system contributions that make the work stand out.

RLEIF Training Method
Enhanced Mathematical Understanding
Process Supervision
🔎 Similar Papers
No similar papers found.
H
Haipeng Luo
Shenzhen International Graduate School, Tsinghua University
Qingfeng Sun
Qingfeng Sun
Tencent Hunyuan X
Natural Language Processing
C
Can Xu
Microsoft Corporation
P
Pu Zhao
Microsoft Corporation
J
Jian-Guang Lou
Microsoft Corporation
Chongyang Tao
Chongyang Tao
Associate Professor of Computer Science, Beihang University
Natural Language ProcessingDialogue SystemsInformation RetrievalData Intelligence
Xiubo Geng
Xiubo Geng
Microsoft Corporation
Qingwei Lin
Qingwei Lin
Microsoft
S
Shifeng Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Dongmei Zhang
Dongmei Zhang
Microsoft Research
Software EngineeringMachine LearningInformation Visualization