Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the challenge of simultaneously achieving accuracy and diversity in mathematical reasoning with large language models (LLMs), this paper proposes a step-level evaluation-and-generation co-design framework that requires no human annotation. Methodologically, we (1) introduce the first process reward model (PRM) automatically constructed via Monte Carlo tree search and similarity-augmented training, enabling fine-grained quality assessment of intermediate reasoning steps; and (2) pioneer the integration of generative flow networks (GFlowNets) into LLM-based mathematical reasoning, using the PRM as a step-level reward signal to efficiently sample high-quality, diverse solution paths. Evaluated on Llama3.2-3B, our approach achieves a +2.59% absolute accuracy gain on MATH Level 5 and a +9.4% improvement in zero-shot generalization on SAT MATH—significantly advancing the state of process supervision and diversity-aware generation in mathematical reasoning.

Technology Category

Application Category

📝 Abstract

Achieving both accuracy and diverse reasoning remains challenging for Large Language Models (LLMs) in complex domains like mathematics. A key bottleneck is evaluating intermediate reasoning steps to guide generation without costly human annotations. To address this, we first introduce a novel Process Reward Model (PRM) trained automatically using Monte Carlo Tree Search coupled with a similarity-based data augmentation technique, effectively capturing step-level reasoning quality. Leveraging this PRM, we then adapt Generative Flow Networks (GFlowNets) to operate at the reasoning step level. Unlike traditional reinforcement learning focused on maximizing a single reward, GFlowNets naturally sample diverse, high-quality solutions proportional to their rewards, as measured by our PRM. Empirical evaluation shows strong improvements in both accuracy and solution diversity on challenging mathematical benchmarks (e.g., +2.59% absolute accuracy on MATH Level 5 for Llama3.2-3B), with effective generalization to unseen datasets (+9.4% absolute on SAT MATH). Our work demonstrates the potential of PRM-guided, step-level GFlowNets for developing more robust and versatile mathematical reasoning in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM accuracy and diversity in mathematical reasoning

Evaluating intermediate reasoning steps without human annotations

Enhancing generalization to unseen mathematical datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Process Reward Model using Monte Carlo Tree Search

Step-level reasoning with Generative Flow Networks

Diverse high-quality solutions via reward-proportional sampling

🔎 Similar Papers

No similar papers found.