Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

📅 2024-10-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the low verification efficiency and high supervision cost of large language models (LLMs) in multi-step mathematical reasoning, this paper proposes Twisted Sequential Monte Carlo (TSMC), the first application of TSMC to LLM reasoning verification. TSMC estimates the expected future reward of partial solutions to enable reward-guided adaptive sampling, dynamically concentrating computational effort on high-potential reasoning paths—without requiring human process supervision, thereby substantially reducing annotation overhead. Theoretical analysis identifies the root cause of sampling inefficiency in conventional verification methods and establishes convergence guarantees for TSMC. Experiments across multiple mathematical reasoning benchmarks demonstrate that TSMC achieves superior solution quality with significantly fewer samples than state-of-the-art verification approaches, offering improved efficiency, scalability, and theoretical rigor.

Technology Category

Application Category

📝 Abstract

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

Problem

Research questions and friction points this paper is trying to address.

Enhance multi-step reasoning in Large Language Models (LLMs).

Improve sampling efficiency in verification methods.

Reduce dependency on costly step-wise human annotations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Twisted Sequential Monte Carlo for efficient sampling

Estimates future rewards at partial solutions

Eliminates need for step-wise human annotations

🔎 Similar Papers

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models