Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Large language models (LLMs) in medical question answering suffer from hallucination and outdated knowledge, while existing retrieval-augmented generation (RAG) approaches are hindered by inadequate reasoning modeling and low-quality medical corpus retrieval. To address these limitations, we propose Discuss-RAG—a novel framework featuring a pioneering agent-driven pre-retrieval “discussion” mechanism. It employs a collaborative two-agent architecture—Summarization Agent and Decision Agent—to emulate expert multi-turn reasoning, enabling human-like dynamic passage selection and semantic refinement. Crucially, this design decouples retrieval optimization from answer generation via a plug-and-play dual-agent structure. Extensive evaluation on four standard biomedical QA benchmarks—BioASQ, PubMedQA, MedQA-USMLE, and MMLU-Med—demonstrates consistent and significant improvements over MedRAG, with up to a 16.67% absolute gain in answer accuracy. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Medical question answering (QA) is a reasoning-intensive task that remains challenging for large language models (LLMs) due to hallucinations and outdated domain knowledge. Retrieval-Augmented Generation (RAG) provides a promising post-training solution by leveraging external knowledge. However, existing medical RAG systems suffer from two key limitations: (1) a lack of modeling for human-like reasoning behaviors during information retrieval, and (2) reliance on suboptimal medical corpora, which often results in the retrieval of irrelevant or noisy snippets. To overcome these challenges, we propose Discuss-RAG, a plug-and-play module designed to enhance the medical QA RAG system through collaborative agent-based reasoning. Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content. Additionally, a decision-making agent evaluates the retrieved snippets before their final integration. Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG, especially significantly improving answer accuracy by up to 16.67% on BioASQ and 12.20% on PubMedQA. The code is available at: https://github.com/LLM-VLM-GSL/Discuss-RAG.

Problem

Research questions and friction points this paper is trying to address.

Enhancing medical QA by improving retrieval relevance through agent-led discussions

Addressing hallucinations and outdated knowledge in LLMs for medical reasoning

Optimizing medical RAG systems by emulating human-like collaborative reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-led discussions enhance medical QA RAG

Summarizer agent orchestrates multi-turn brainstorming

Decision-making agent evaluates retrieved snippets

🔎 Similar Papers

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA