Unbiased Dynamic Multimodal Fusion

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses critical limitations in existing dynamic multimodal fusion approaches, which rely on heuristic metrics to assess modality quality, fail under extreme noise conditions, and overlook inherent inter-modality dependency biases—leading to doubly suppressed learning of challenging modalities. To overcome these issues, the paper proposes an Unbiased Dynamic Multimodal Learning (UDML) framework that innovatively integrates controlled noise injection with uncertainty prediction to construct a noise-aware uncertainty estimator. Furthermore, UDML explicitly quantifies dependency bias through a modality dropout strategy, enabling adaptive and unbiased weighting of modality contributions. Extensive experiments across multiple multimodal benchmark tasks demonstrate that UDML significantly outperforms both static and state-of-the-art dynamic fusion methods, exhibiting strong effectiveness, broad applicability, and robust generalization capability.

Technology Category

Application Category

📝 Abstract

Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution accordingly. However, they typically rely on empirical metrics, failing to measure the modality quality when noise levels are extremely low or high. Moreover, existing methods usually assume that the initial contribution of each modality is the same, neglecting the intrinsic modality dependency bias. As a result, the modality hard to learn would be doubly penalized, and the performance of dynamical fusion could be inferior to that of static fusion. To address these challenges, we propose the Unbiased Dynamic Multimodal Learning (UDML) framework. Specifically, we introduce a noise-aware uncertainty estimator that adds controlled noise to the modality data and predicts its intensity from the modality feature. This forces the model to learn a clear correspondence between feature corruption and noise level, allowing accurate uncertainty measure across both low- and high-noise conditions. Furthermore, we quantify the inherent modality reliance bias within multimodal networks via modality dropout and incorporate it into the weighting mechanism. This eliminates the dual suppression effect on the hard-to-learn modality. Extensive experiments across diverse multimodal benchmark tasks validate the effectiveness, versatility, and generalizability of the proposed UDML. The code is available at https://github.com/shicaiwei123/UDML.

Problem

Research questions and friction points this paper is trying to address.

dynamic multimodal fusion

modality quality assessment

modality bias

noise robustness

uncertainty estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbiased Dynamic Fusion

Noise-aware Uncertainty Estimation

Modality Dependency Bias