RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

📅 2024-12-03
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit weak factual grounding and poor logical coherence in knowledge-intensive complex reasoning tasks—e.g., commonsense and medical reasoning. Method: We propose rStar, a novel framework that integrates dual retrieval-augmented actions (A6/A7) into Monte Carlo Tree Search (MCTS), replaces the conventional discriminator with a retrieval-augmented factuality scorer for fact-prioritized path selection, and incorporates dynamic subproblem retrieval, context-aware re-answering, and multi-hop fact verification. Contribution/Results: Evaluated on LLaMA-3.1, rStar significantly improves both logical coherence and factual accuracy. On multiple benchmarks—including CommonsenseQA, MedQA, and StrategyQA—it achieves open-source model performance competitive with GPT-4 and GPT-4o. To our knowledge, rStar is the first MCTS-based framework to enable fact-aware reasoning, establishing a new paradigm for factual, search-driven inference in LLMs.

Technology Category

Application Category

📝 Abstract
This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as commonsense and medical reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: A6, which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and A7, which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top open-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning accuracy in large language models
Improving factual integrity for knowledge-intensive tasks
Scaling logical coherence in complex reasoning domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented reasoning in MCTS framework
Generates and retrieves queries for reasoning enhancement
Retrieval-augmented factuality scorer replaces original discriminator
🔎 Similar Papers
No similar papers found.
Hieu Tran
Hieu Tran
University of Maryland, College Park
Natural Language ProcessingLarge Language Models
Zonghai Yao
Zonghai Yao
Umass Amherst
Medical-LLMMulti-agent AI HospitalClinical ReasoningSynthetic DataPatient Education
Junda Wang
Junda Wang
University of Massachusetts Amherst
Natural Language ProcessingCausal InferenceHealthcare
Y
Yifan Zhang
Miner School of Computer and Information Sciences, University of Massachusetts Lowell, MA, USA
Z
Zhichao Yang
Manning College of Information and Computer Sciences, University of Massachusetts Amherst, MA, USA
H
Hong Yu
Manning College of Information and Computer Sciences, University of Massachusetts Amherst, MA, USA; Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA; Miner School of Computer and Information Sciences, University of Massachusetts Lowell, MA, USA; Center for Healthcare Organization and Implementation Research, V A Bedford Health Care, MA, USA