A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address critical challenges in Radiology Visual Question Answering (RVQA)—including low factual accuracy, frequent hallucinations, and insufficient cross-modal alignment—this paper proposes a multi-agent collaborative framework. It introduces three functionally specialized agents: a Context Interpreter, a Multimodal Reasoner, and an Answer Verifier, enabling interpretable, complex reasoning through explicit role division and coordinated interaction. To enhance evaluation rigor, we innovatively employ model disagreement filtering to construct a high-difficulty benchmark dataset. Furthermore, the framework integrates multimodal large language models (MLLMs) with retrieval-augmented generation (RAG) to strengthen clinical knowledge grounding and enforce factual constraints. Experimental results demonstrate that our approach significantly outperforms state-of-the-art MLLM baselines on challenging RVQA benchmarks, achieving superior robustness, interpretability, and clinical applicability.

Technology Category

Application Category

📝 Abstract

Radiology visual question answering (RVQA) provides precise answers to questions about chest X-ray images, alleviating radiologists' workload. While recent methods based on multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have shown promising progress in RVQA, they still face challenges in factual accuracy, hallucinations, and cross-modal misalignment. We introduce a multi-agent system (MAS) designed to support complex reasoning in RVQA, with specialized agents for context understanding, multimodal reasoning, and answer validation. We evaluate our system on a challenging RVQA set curated via model disagreement filtering, comprising consistently hard cases across multiple MLLMs. Extensive experiments demonstrate the superiority and effectiveness of our system over strong MLLM baselines, with a case study illustrating its reliability and interpretability. This work highlights the potential of multi-agent approaches to support explainable and trustworthy clinical AI applications that require complex reasoning.

Problem

Research questions and friction points this paper is trying to address.

Improving factual accuracy in radiology visual question answering

Reducing hallucinations in multimodal large language models

Addressing cross-modal misalignment in clinical AI applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for complex reasoning

Specialized agents for multimodal tasks

Model disagreement filtering for hard cases

🔎 Similar Papers

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering