Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

📅 2024-11-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address the insufficient modeling of dynamic disease progression in sequential chest X-ray images for radiology report generation (RRG), this paper proposes a time-aware multimodal large language model (MLLM) framework. Methodologically, we design a radiology-specific image encoder and a Time-Aware Connector (TAC) to enable fine-grained difference modeling and cross-temporal semantic alignment between current and prior studies. An end-to-end joint training strategy is adopted to fully exploit clinical reasoning cues embedded in multi-temporal, multimodal data. Evaluated on the MIMIC-CXR dataset, our approach achieves significant improvements in clinical relevance (+3.2%) and lexical accuracy (+2.8%), establishing new state-of-the-art performance among models of comparable scale. To the best of our knowledge, this is the first work to realize time-sensitive vision–language co-modeling explicitly tailored for radiological tasks.

Technology Category

Application Category

📝 Abstract

Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce Libra, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (TAC), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled MLLMs, setting new standards in both clinical relevance and lexical accuracy.

Problem

Research questions and friction points this paper is trying to address.

Leveraging temporal images in radiology

Enhancing medical report generation accuracy

Integrating temporal differences in medical datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Alignment Connector

Radiology-specific image encoder

Multimodal large language model

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training