Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

๐Ÿ“… 2024-11-28
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the insufficient modeling of dynamic disease progression in sequential chest X-ray images for radiology report generation (RRG), this paper proposes a time-aware multimodal large language model (MLLM) framework. Methodologically, we design a radiology-specific image encoder and a Time-Aware Connector (TAC) to enable fine-grained difference modeling and cross-temporal semantic alignment between current and prior studies. An end-to-end joint training strategy is adopted to fully exploit clinical reasoning cues embedded in multi-temporal, multimodal data. Evaluated on the MIMIC-CXR dataset, our approach achieves significant improvements in clinical relevance (+3.2%) and lexical accuracy (+2.8%), establishing new state-of-the-art performance among models of comparable scale. To the best of our knowledge, this is the first work to realize time-sensitive visionโ€“language co-modeling explicitly tailored for radiological tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce Libra, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (TAC), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled MLLMs, setting new standards in both clinical relevance and lexical accuracy.
Problem

Research questions and friction points this paper is trying to address.

Leveraging temporal images in radiology
Enhancing medical report generation accuracy
Integrating temporal differences in medical datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Alignment Connector
Radiology-specific image encoder
Multimodal large language model
X
Xi Zhang
Information Retrieval Group, AI4BioMed Lab, School of Computing Science, University of Glasgow
Zaiqiao Meng
Zaiqiao Meng
Lecturer at University of Glasgow, Affiliated Lecturer at University of Cambridge
AI AgentsNLP & IRKnowledge GraphAI4BiomedicineMachine Learning
Jake Lever
Jake Lever
University of Glasgow
Biomedical text miningMachine learningPrecision medicine
E
Edmond S.L. Ho
Information Retrieval Group, AI4BioMed Lab, School of Computing Science, University of Glasgow