From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the susceptibility of large language models to hallucinations and outdated knowledge in medical question answering, as well as the limited multi-hop reasoning capability and noise robustness of existing retrieval-augmented generation (RAG) approaches. To overcome these limitations, the authors propose MA-RAG, a novel framework that leverages semantic conflicts among candidate answers as an active signal to drive iterative optimization of retrieval queries and reasoning trajectories through a multi-agent process, thereby enabling co-evolution of retrieval and reasoning. Integrating self-consistency principles with test-time scaling, MA-RAG achieves an average accuracy improvement of 6.8 points across seven medical QA benchmarks, significantly outperforming current RAG and test-time scaling methods.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods rely on noisy token-level signals and lack the multi-round refinement required for complex reasoning. In the paper, we propose MA-RAG (Multi-Round Agentic RAG), a framework that facilitates test-time scaling for complex medical reasoning by iteratively evolving both external evidence and internal reasoning history within an agentic refinement loop. At each round, the agent transforms semantic conflict among candidate responses into actionable queries to retrieve external evidence, while optimizing history reasoning traces to mitigate long-context degradation. MA-RAG extends the self-consistency principle by leveraging the lack of consistency as a proactive signal for multi-round agentic reasoning and retrieval, and mirrors a boosting mechanism that iteratively minimizes the residual error toward a stable, high-fidelity medical consensus. Extensive evaluations across 7 medical Q&A benchmarks show that MA-RAG consistently surpasses competitive inference-time scaling and RAG baselines, delivering substantial +6.8 points on average accuracy over the backbone model. Our code is available at https://github.com/NJU-RL/MA-RAG.

Problem

Research questions and friction points this paper is trying to address.

medical reasoning

hallucination

outdated knowledge

Retrieval-Augmented Generation

multi-round refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Round Agentic RAG

Medical Reasoning

Retrieval-Augmented Generation