WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

📅 2023-08-18

🏛️ arXiv.org

📈 Citations: 321

✨ Influential: 58

career value

174K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak chain-of-thought (CoT) reasoning in complex mathematics, heavily rely on external tools, and lack explicit supervision over reasoning processes. Method: We propose Reinforcement Learning-driven Evolutionary Instruction Feedback (RLEIF), the first framework jointly optimizing instruction evolution and stepwise reasoning supervision. Built upon open-source base models (e.g., Mistral, Gemma), RLEIF integrates evolutionary instruction tuning, process-supervised reinforcement learning, and distillation of high-quality mathematical CoT data to enable fully language-based, end-to-end mathematical problem solving. Contribution/Results: The resulting WizardMath-7B achieves state-of-the-art performance among open-source models of comparable size—92.3% on GSM8K and 52.4% on MATH. Its 70B variant surpasses GPT-3.5-Turbo, Claude 2, and early GPT-4, empirically validating the effectiveness and scalability of process-oriented reasoning optimization.

📝 Abstract

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical CoT reasoning abilities of LLMs without using external python tools, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses top-tier open-source LLMs by a substantial margin with higher data efficiency. Furthermore, WizardMath 70B even outperforms GPT-3.5-Turbo, Claude 2, Gemini Pro and GPT-4-early-version. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance. For more details refer to https://github.com/nlpxucan/WizardLM

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Mathematical Understanding

Complex Problem Solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

RLEIF Training Method

Enhanced Mathematical Understanding

Process Supervision

🔎 Similar Papers

No similar papers found.