Enhancing Test-Time Scaling of Large Language Models with Hierarchical Retrieval-Augmented MCTS

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of large language models (LLMs) in mathematical reasoning—namely, constrained inference capabilities and reliance on high-quality Chain-of-Thought (CoT) data distilled from stronger models—this paper proposes R2-LLMs, a hierarchical retrieval-augmented reasoning framework that requires no external CoT annotations. Its core innovation is a two-tiered retrieval mechanism: coarse-grained template matching for problem abstraction and fine-grained intermediate-step retrieval to support stepwise reasoning, jointly guided by a process-oriented reward model within Monte Carlo Tree Search (MCTS) for candidate generation and decision optimization. By integrating retrieval-augmented in-context learning with hierarchical reasoning, R2-LLMs significantly improves reasoning generalization without external distillation data. On MATH500, GSM8K, and OlympiadBench-TO, R2-LLMs built upon LLaMA-3.1-8B achieves up to a 16% absolute accuracy gain over strong baselines, demonstrating its effectiveness and scalability for complex mathematical reasoning tasks.

Technology Category

Application Category

📝 Abstract
Test-time scaling has emerged as a promising paradigm in language modeling, leveraging additional computational resources at inference time to enhance model performance. In this work, we introduce R2-LLMs, a novel and versatile hierarchical retrieval-augmented reasoning framework designed to improve test-time scaling in large language models (LLMs) without requiring distillation from more advanced models to obtain chain-of-thought (CoT) training data. R2-LLMs enhances inference-time generalization by integrating dual-level retrieval-based in-context learning: (1) At the coarse level, our approach extracts abstract templates from complex reasoning problems and retrieves similar problem-answer pairs to facilitate high-level in-context learning; (2) At the fine level, during Monte Carlo Tree Search (MCTS), R2-LLMs efficiently retrieves analogous intermediate solution steps from reference mathematical problem datasets, refining step-wise reasoning with the aid of a process reward model (PRM) for scoring. R2-LLMs is a robust hierarchical reasoning-augmentation method that enhances in-context-level reasoning while seamlessly integrating with step-level tree search methods. Utilizing PRM, it refines both candidate generation and decision-making for improved reasoning accuracy. Empirical evaluations on the MATH500, GSM8K, and OlympiadBench-TO datasets achieve substantial relative improvement with an increase of up to 16% using LLaMA-3.1-8B compared to the baselines, showcasing the effectiveness of our approach in complex reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

Improves test-time scaling in LLMs without distillation
Enhances inference-time generalization via hierarchical retrieval
Boosts reasoning accuracy with Monte Carlo Tree Search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical retrieval-augmented reasoning framework R2-LLMs
Dual-level retrieval-based in-context learning
Monte Carlo Tree Search with process reward model
🔎 Similar Papers
No similar papers found.
A
Alex ZH Dou
Case Western Reserve University
Zhongwei Wan
Zhongwei Wan
The Ohio State University, PhD student
LLMMultimodalNLP
D
Dongfei Cui
Duke University
X
Xin Wang
The Ohio State University
J
Jing Xiong
University of Hong Kong
Haokun Lin
Haokun Lin
City University of Hong Kong & CASIA
Multi-modal LearningEfficient Deep Learning
C
Chaofan Tao
University of Hong Kong
S
Shen Yan
ByteDance
M
Mi Zhang
The Ohio State University