🤖 AI Summary
Large language models (LLMs) in medical question answering suffer from hallucination and outdated knowledge, while existing retrieval-augmented generation (RAG) approaches are hindered by inadequate reasoning modeling and low-quality medical corpus retrieval. To address these limitations, we propose Discuss-RAG—a novel framework featuring a pioneering agent-driven pre-retrieval “discussion” mechanism. It employs a collaborative two-agent architecture—Summarization Agent and Decision Agent—to emulate expert multi-turn reasoning, enabling human-like dynamic passage selection and semantic refinement. Crucially, this design decouples retrieval optimization from answer generation via a plug-and-play dual-agent structure. Extensive evaluation on four standard biomedical QA benchmarks—BioASQ, PubMedQA, MedQA-USMLE, and MMLU-Med—demonstrates consistent and significant improvements over MedRAG, with up to a 16.67% absolute gain in answer accuracy. The implementation is publicly available.
📝 Abstract
Medical question answering (QA) is a reasoning-intensive task that remains challenging for large language models (LLMs) due to hallucinations and outdated domain knowledge. Retrieval-Augmented Generation (RAG) provides a promising post-training solution by leveraging external knowledge. However, existing medical RAG systems suffer from two key limitations: (1) a lack of modeling for human-like reasoning behaviors during information retrieval, and (2) reliance on suboptimal medical corpora, which often results in the retrieval of irrelevant or noisy snippets. To overcome these challenges, we propose Discuss-RAG, a plug-and-play module designed to enhance the medical QA RAG system through collaborative agent-based reasoning. Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content. Additionally, a decision-making agent evaluates the retrieved snippets before their final integration. Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG, especially significantly improving answer accuracy by up to 16.67% on BioASQ and 12.20% on PubMedQA. The code is available at: https://github.com/LLM-VLM-GSL/Discuss-RAG.