MARGE: Improving Math Reasoning for LLMs with Guided Exploration

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the high noise and spurious correlations in self-generated data for mathematical reasoning with large language models (LLMs), stemming from scarcity of high-quality queries, this paper proposes HIT-Explore: a hit-guided, multi-stage exploration mechanism. HIT-Explore models intermediate reasoning states and uses sparse feedback—whether an intermediate result hits a correct sub-goal—to dynamically steer search trajectories and perform reinforcement-learning–style credit assignment. It requires no external annotations or auxiliary value models, thereby overcoming the traditional trade-off between accuracy and exploration diversity in alignment methods. Evaluated across multiple backbone models and benchmarks, HIT-Explore significantly improves both single-step inference accuracy and the diversity of effective reasoning paths, demonstrating the feasibility and effectiveness of scaling up low-noise, self-generated data for training.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce extbf{MARGE}: Improving extbf{Ma}th extbf{R}easoning with extbf{G}uided extbf{E}xploration, a novel method to address this issue and enhance mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Through extensive experiments across multiple backbone models and benchmarks, we demonstrate that MARGE significantly improves reasoning capabilities without requiring external annotations or training additional value models. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE's effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data. Our code and models are available at href{https://github.com/georgao35/MARGE}{this link}.

Problem

Research questions and friction points this paper is trying to address.

Enhancing math reasoning in LLMs with guided exploration

Addressing limited high-quality query data in LLM math reasoning

Improving exploration diversity and accuracy in math reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Guided exploration enhances math reasoning stages

Systematic exploration of intermediate reasoning states

Improves accuracy and diversity without external annotations

🔎 Similar Papers

No similar papers found.