Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation

πŸ“… 2025-04-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Multimodal pre-trained models suffer significant performance degradation under modality-missing scenarios, and existing reconstruction methods neglect uncertainty estimation, compromising downstream reliability. To address this, we propose SUREβ€”a framework that jointly models modality reconstruction and uncertainty estimation in the latent space. SURE is the first to embed statistical error propagation into deep neural networks; it introduces a novel Pearson correlation-based loss function that explicitly enforces consistency between reconstruction fidelity and downstream task objectives; and it enables architecture-agnostic, scalable, end-to-end joint optimization. Evaluated on sentiment analysis, fine-grained classification, and action recognition, SURE achieves state-of-the-art performance, markedly improving model robustness, generalization, and interpretability under incomplete multimodal inputs.

Technology Category

Application Category

πŸ“ Abstract
Multimodal learning has demonstrated incredible successes by integrating diverse data sources, yet it often relies on the availability of all modalities - an assumption that rarely holds in real-world applications. Pretrained multimodal models, while effective, struggle when confronted with small-scale and incomplete datasets (i.e., missing modalities), limiting their practical applicability. Previous studies on reconstructing missing modalities have overlooked the reconstruction's potential unreliability, which could compromise the quality of the final outputs. We present SURE (Scalable Uncertainty and Reconstruction Estimation), a novel framework that extends the capabilities of pretrained multimodal models by introducing latent space reconstruction and uncertainty estimation for both reconstructed modalities and downstream tasks. Our method is architecture-agnostic, reconstructs missing modalities, and delivers reliable uncertainty estimates, improving both interpretability and performance. SURE introduces a unique Pearson Correlation-based loss and applies statistical error propagation in deep networks for the first time, allowing precise quantification of uncertainties from missing data and model predictions. Extensive experiments across tasks such as sentiment analysis, genre classification, and action recognition show that SURE consistently achieves state-of-the-art performance, ensuring robust predictions even in the presence of incomplete data.
Problem

Research questions and friction points this paper is trying to address.

Handling missing modalities in multimodal pretraining models
Addressing unreliability in reconstructed missing modalities
Improving interpretability and performance with uncertainty estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent space reconstruction for missing modalities
Uncertainty estimation via Pearson Correlation loss
Statistical error propagation in deep networks
πŸ”Ž Similar Papers
No similar papers found.
Duy A. Nguyen
Duy A. Nguyen
PhD Candidate, CS @ UIUC
Machine LearningMultimodal LearningLLM
Q
Quan Huu Do
College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam
K
Khoa D. Doan
College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam
Minh N. Do
Minh N. Do
Professor, University of Illinois at Urbana-Champaign and VinUniversity
signal processingcomputational imagingmachine perceptiondata science