MedPatch: Confidence-Guided Multi-Stage Fusion for Multimodal Clinical Data

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high heterogeneity, limited sample size, and frequent modality missingness in clinical multimodal data (e.g., time-series, imaging, and textual), this paper proposes a confidence-guided multi-stage fusion framework. The method innovatively integrates joint and late fusion strategies, incorporating a modality-aware module and a token-level confidence calibration mechanism; dynamically adaptive multi-stage feature fusion is achieved via latent token-block clustering. Evaluated on MIMIC-III/IV, the framework significantly improves in-hospital mortality prediction and clinical diagnosis classification, achieving state-of-the-art (SOTA) performance. Key contributions include: (i) the first application of confidence-driven token clustering for dynamic multimodal fusion in clinical settings; (ii) robust modeling under incomplete modality inputs; and (iii) strong generalization with limited labeled data.

Technology Category

Application Category

📝 Abstract
Clinical decision-making relies on the integration of information across various data modalities, such as clinical time-series, medical images and textual reports. Compared to other domains, real-world medical data is heterogeneous in nature, limited in size, and sparse due to missing modalities. This significantly limits model performance in clinical prediction tasks. Inspired by clinical workflows, we introduce MedPatch, a multi-stage multimodal fusion architecture, which seamlessly integrates multiple modalities via confidence-guided patching. MedPatch comprises three main components: (i) a multi-stage fusion strategy that leverages joint and late fusion simultaneously, (ii) a missingness-aware module that handles sparse samples with missing modalities, (iii) a joint fusion module that clusters latent token patches based on calibrated unimodal token-level confidence. We evaluated MedPatch using real-world data consisting of clinical time-series data, chest X-ray images, radiology reports, and discharge notes extracted from the MIMIC-IV, MIMIC-CXR, and MIMIC-Notes datasets on two benchmark tasks, namely in-hospital mortality prediction and clinical condition classification. Compared to existing baselines, MedPatch achieves state-of-the-art performance. Our work highlights the effectiveness of confidence-guided multi-stage fusion in addressing the heterogeneity of multimodal data, and establishes new state-of-the-art benchmark results for clinical prediction tasks.
Problem

Research questions and friction points this paper is trying to address.

Integrates heterogeneous clinical data modalities for decision-making
Handles missing modalities in sparse medical datasets
Improves clinical prediction tasks via multi-stage fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage fusion combining joint and late fusion
Missingness-aware module for sparse multimodal data
Confidence-guided clustering of latent token patches
🔎 Similar Papers
No similar papers found.