No Data? No Problem: Robust Vision-Tabular Learning with Missing Values

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In clinical medical imaging analysis, tabular data often exhibits large-scale, arbitrary-rate (0%–100%) missingness during inference—a critical yet underaddressed challenge. To address this, we propose a missingness-robust cross-modal fusion framework. Methodologically, we pioneer modeling tabular missingness as a data augmentation strategy for contrastive pretraining; design a Tabular More vs. Fewer ranking loss to enable missingness-rate-invariant representation learning; and introduce gated cross-modal attention with decoupled gradient updates to separate missingness-sensitive and missingness-robust feature pathways. Evaluated on UK Biobank cardiac MRI, our method significantly outperforms state-of-the-art approaches and generalizes effectively to external cardiac MRI and natural-image (car advertisement) datasets, demonstrating strong cross-domain robustness. This work establishes a novel paradigm for trustworthy multimodal medical AI under sparse clinical data conditions.

Technology Category

Application Category

📝 Abstract

Large-scale medical biobanks provide imaging data complemented by extensive tabular information, such as demographics or clinical measurements. However, this abundance of tabular attributes does not reflect real-world datasets, where only a subset of attributes may be available. This discrepancy calls for methods that can leverage all the tabular data during training while remaining robust to missing values at inference. To address this challenge, we propose RoVTL (Robust Vision-Tabular Learning), a framework designed to handle any level of tabular data availability, from 0% to 100%. RoVTL comprises two key stages: contrastive pretraining, where we introduce tabular attribute missingness as data augmentation to promote robustness, and downstream task tuning using a gated cross-attention module for multimodal fusion. During fine-tuning, we employ a novel Tabular More vs. Fewer loss that ranks performance based on the amount of available tabular data. Combined with disentangled gradient learning, this enables consistent performance across all tabular data completeness scenarios. We evaluate RoVTL on cardiac MRI scans from the UK Biobank, demonstrating superior robustness to missing tabular data compared to prior methods. Furthermore, RoVTL successfully generalizes to an external cardiac MRI dataset for multimodal disease classification, and extends to the natural images domain, achieving robust performance on a car advertisements dataset. The code is available at https://github.com/marteczkah/RoVTL.

Problem

Research questions and friction points this paper is trying to address.

Handles missing tabular data in multimodal learning

Robustly fuses vision and tabular data with varying availability

Ensures consistent performance across all data completeness levels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive pretraining with missing data augmentation

Gated cross-attention for multimodal fusion

Tabular More vs. Fewer loss with disentangled gradients

🔎 Similar Papers

No similar papers found.