🤖 AI Summary
To address error propagation and multi-branch verification bottlenecks in open-domain, knowledge-intensive complex reasoning with large language models (LLMs), this paper proposes a risk-aware, dynamic retrieval-augmented reasoning framework based on Monte Carlo Tree Search (MCTS). Our method integrates retrieval-augmented generation (RAG) dynamically into the reasoning path selection process. Key contributions include: (1) a risk assessment model for intermediate reasoning states to enable early detection and suppression of error propagation; (2) a risk-adaptive MCTS search strategy that jointly optimizes exploration–exploitation trade-offs and multi-hypothesis verification; and (3) seamless integration of dynamic RAG within the MCTS-based reasoning pipeline. Evaluated on knowledge-enhanced reasoning tasks, our approach achieves a 23.10% absolute improvement over state-of-the-art baselines and outperforms the latest RAG-enhanced large reasoning models by 25.37%.
📝 Abstract
Large language models (LLMs) have demonstrated impressive capabilities and are receiving increasing attention to enhance their reasoning through scaling test--time compute. However, their application in open--ended, knowledge--intensive, complex reasoning scenarios is still limited. Reasoning--oriented methods struggle to generalize to open--ended scenarios due to implicit assumptions of complete world knowledge. Meanwhile, knowledge--augmented reasoning (KAR) methods fail to address two core challenges: 1) error propagation, where errors in early steps cascade through the chain, and 2) verification bottleneck, where the explore--exploit tradeoff arises in multi--branch decision processes. To overcome these limitations, we introduce ARise, a novel framework that integrates risk assessment of intermediate reasoning states with dynamic retrieval--augmented generation (RAG) within a Monte Carlo tree search paradigm. This approach enables effective construction and optimization of reasoning plans across multiple maintained hypothesis branches. Experimental results show that ARise significantly outperforms the state--of--the--art KAR methods by up to 23.10%, and the latest RAG-equipped large reasoning models by up to 25.37%.