A Multimodal Multi-Agent Framework for Radiology Report Generation

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical challenges in radiology report generation (RRG) from medical images—including factual inconsistency, hallucination, and cross-modal misalignment—by proposing a clinical stepwise multi-agent framework. The framework decomposes the RRG pipeline into five specialized agents: retrieval, draft generation, visual analysis, refinement, and structured synthesis, synergistically integrating multimodal large language models (MLLMs), retrieval-augmented generation (RAG), and structured report synthesis strategies. Unlike prior approaches, this is the first multi-agent architecture explicitly designed to emulate clinical reasoning pathways, thereby enhancing factual accuracy, structural conformity, and interpretability of generated reports. Comprehensive evaluations—including automated metrics and large language model-based assessment—demonstrate consistent superiority over strong baselines. The generated reports exhibit significantly improved clinical readability and trustworthiness, validating the framework’s efficacy in bridging semantic and modality gaps in medical AI-assisted reporting.

Technology Category

Application Category

📝 Abstract
Radiology report generation (RRG) aims to automatically produce diagnostic reports from medical images, with the potential to enhance clinical workflows and reduce radiologists' workload. While recent approaches leveraging multimodal large language models (MLLMs) and retrieval-augmented generation (RAG) have achieved strong results, they continue to face challenges such as factual inconsistency, hallucination, and cross-modal misalignment. We propose a multimodal multi-agent framework for RRG that aligns with the stepwise clinical reasoning workflow, where task-specific agents handle retrieval, draft generation, visual analysis, refinement, and synthesis. Experimental results demonstrate that our approach outperforms a strong baseline in both automatic metrics and LLM-based evaluations, producing more accurate, structured, and interpretable reports. This work highlights the potential of clinically aligned multi-agent frameworks to support explainable and trustworthy clinical AI applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing factual inconsistency in radiology report generation
Reducing hallucination in multimodal medical image analysis
Improving cross-modal alignment for clinical reasoning workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal multi-agent framework for RRG
Task-specific agents for clinical workflow
Improved accuracy and interpretability