Calibrated Multimodal Representation Learning with Missing Modalities

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Multimodal representation learning often suffers from alignment bias due to missing modalities; existing methods assume complete modality availability per instance, limiting applicability to real-world scenarios with pervasive partial observations. This paper proposes CalMRL, a novel framework that models incomplete alignment from the perspective of anchor point shift—the first to introduce a modality-prior- and correlation-driven calibration mechanism at the representation level. CalMRL designs representation-level modality completion and a two-stage learning strategy, deriving a closed-form optimization solution based on the shared latent variable’s posterior distribution. Experiments demonstrate that CalMRL significantly mitigates anchor point shift and improves convergence stability, achieving state-of-the-art performance across multiple missing-modality benchmarks. Moreover, it substantially enhances robust cross-modal fusion under heterogeneous modality absence.

Technology Category

Application Category

📝 Abstract

Multimodal representation learning harmonizes distinct modalities by aligning them into a unified latent space. Recent research generalizes traditional cross-modal alignment to produce enhanced multimodal synergy but requires all modalities to be present for a common instance, making it challenging to utilize prevalent datasets with missing modalities. We provide theoretical insights into this issue from an anchor shift perspective. Observed modalities are aligned with a local anchor that deviates from the optimal one when all modalities are present, resulting in an inevitable shift. To address this, we propose CalMRL for multimodal representation learning to calibrate incomplete alignments caused by missing modalities. Specifically, CalMRL leverages the priors and the inherent connections among modalities to model the imputation for the missing ones at the representation level. To resolve the optimization dilemma, we employ a bi-step learning method with the closed-form solution of the posterior distribution of shared latents. We validate its mitigation of anchor shift and convergence with theoretical guidance. By equipping the calibrated alignment with the existing advanced method, we offer new flexibility to absorb data with missing modalities, which is originally unattainable. Extensive experiments and comprehensive analyses demonstrate the superiority of CalMRL. Our code, model checkpoints, and evaluation raw data will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Addressing multimodal representation learning with missing modalities

Calibrating incomplete alignments caused by absent modalities

Modeling missing modality imputation using representation-level priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

CalMRL calibrates incomplete alignments from missing modalities

Leverages modality priors for representation-level imputation

Employs bi-step learning with closed-form latent solutions

🔎 Similar Papers

No similar papers found.