A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data

📅 2024-05-21

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Radiology report generation aims to alleviate physician workload and mitigate geographic disparities in healthcare access, yet faces challenges in fusing heterogeneous multimodal data—medical images, clinical text, and domain-specific knowledge—while ensuring interpretable outputs. This paper presents a systematic review of over 100 state-of-the-art works and proposes the first standardized multimodal report generation framework, comprising five stages: data preprocessing, feature alignment, knowledge injection, interactive modeling, and evaluation. It is the first to unify large language models, contrastive learning, knowledge graphs, and multimodal alignment techniques within a single architectural paradigm. Quantitative benchmarking across BLEU, CIDEr, and CHAIR metrics is conducted on a unified platform. The survey also synthesizes mainstream datasets (e.g., MIMIC-CXR, CheXpert) and evaluation protocols, distills key technical challenges, and provides a structured roadmap for algorithm development and future research directions.

Technology Category

Application Category

📝 Abstract

Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.

Problem

Research questions and friction points this paper is trying to address.

Automatic radiology report generation using multimodal data.

Challenges in mimicking physicians for accurate report creation.

Survey of deep learning techniques for medical report generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning transforms radiology report generation.

Multimodal data fusion enhances report accuracy.

State-of-the-art methods improve feature learning.

🔎 Similar Papers

Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation