RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

📅 2024-12-03

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak factual grounding and poor logical coherence in knowledge-intensive complex reasoning tasks—e.g., commonsense and medical reasoning. Method: We propose rStar, a novel framework that integrates dual retrieval-augmented actions (A6/A7) into Monte Carlo Tree Search (MCTS), replaces the conventional discriminator with a retrieval-augmented factuality scorer for fact-prioritized path selection, and incorporates dynamic subproblem retrieval, context-aware re-answering, and multi-hop fact verification. Contribution/Results: Evaluated on LLaMA-3.1, rStar significantly improves both logical coherence and factual accuracy. On multiple benchmarks—including CommonsenseQA, MedQA, and StrategyQA—it achieves open-source model performance competitive with GPT-4 and GPT-4o. To our knowledge, rStar is the first MCTS-based framework to enable fact-aware reasoning, establishing a new paradigm for factual, search-driven inference in LLMs.

Technology Category

Application Category

📝 Abstract

This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning accuracy in large language models

Improving factual integrity for knowledge-intensive tasks

Scaling logical coherence in complex reasoning domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented reasoning in MCTS framework

Generates and retrieves queries for reasoning enhancement

Retrieval-augmented factuality scorer replaces original discriminator

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting