Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

📅 2024-11-01

🏛️ North American Chapter of the Association for Computational Linguistics

📈 Citations: 16

✨ Influential: 1

career value

159K/year

🤖 AI Summary

Existing retrieval-augmented generation (RAG) methods for medical question answering struggle with hallucination and outdated knowledge in large language models (LLMs), further hindered by irrelevant context, ambiguous queries, and single-source retrieval bias. To address these limitations, we propose RAG²—a novel framework introducing the “reason-guided” paradigm: (1) a lightweight perplexity-supervised filter selects high-quality reasoning chains; (2) the LLM autonomously generates structured, domain-informed rationales to serve as precise retrieval queries; and (3) a balanced, multi-source retrieval mechanism integrates evidence from PubMed, ClinicalTrials.gov, MedlinePlus, and Wikipedia. Evaluated on three established medical QA benchmarks, RAG² outperforms the best existing medical RAG baseline by 5.6% absolute accuracy and improves mainstream LMs’ performance by up to 6.1%. The framework significantly enhances answer reliability, factual consistency, and cross-domain generalizability.

Technology Category

Application Category

📝 Abstract

Large language models (LLM) hold significant potential for applications in biomedicine, but they struggle with hallucinations and outdated knowledge. While retrieval-augmented generation (RAG) is generally employed to address these issues, it also has its own set of challenges: (1) LLMs are vulnerable to irrelevant or incorrect context, (2) medical queries are often not well-targeted for helpful information, and (3) retrievers are prone to bias toward the specific source corpus they were trained on. In this study, we present RAG$^2$ (RAtionale-Guided RAG), a new framework for enhancing the reliability of RAG in biomedical contexts. RAG$^2$ incorporates three key innovations: a small filtering model trained on perplexity-based labels of rationales, which selectively augments informative snippets of documents while filtering out distractors; LLM-generated rationales as queries to improve the utility of retrieved snippets; a structure designed to retrieve snippets evenly from a comprehensive set of four biomedical corpora, effectively mitigating retriever bias. Our experiments demonstrate that RAG$^2$ improves the state-of-the-art LLMs of varying sizes, with improvements of up to 6.1%, and it outperforms the previous best medical RAG model by up to 5.6% across three medical question-answering benchmarks. Our code is available at https://github.com/dmis-lab/RAG2.

Problem

Research questions and friction points this paper is trying to address.

Addressing LLM hallucinations and outdated knowledge in medical applications

Mitigating retriever bias toward specific biomedical source corpora

Improving relevance of retrieved information for medical question answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Filtering model trained on perplexity-based rationale labels

LLM-generated rationales used as improved queries

Retrieval structure balances snippets from four corpora

🔎 Similar Papers

Two-layer retrieval augmented generation framework for low-resource medical question-answering using Reddit data: Proof of concept (Preprint)