Unbiased Dynamic Multimodal Fusion

๐Ÿ“… 2026-03-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses critical limitations in existing dynamic multimodal fusion approaches, which rely on heuristic metrics to assess modality quality, fail under extreme noise conditions, and overlook inherent inter-modality dependency biasesโ€”leading to doubly suppressed learning of challenging modalities. To overcome these issues, the paper proposes an Unbiased Dynamic Multimodal Learning (UDML) framework that innovatively integrates controlled noise injection with uncertainty prediction to construct a noise-aware uncertainty estimator. Furthermore, UDML explicitly quantifies dependency bias through a modality dropout strategy, enabling adaptive and unbiased weighting of modality contributions. Extensive experiments across multiple multimodal benchmark tasks demonstrate that UDML significantly outperforms both static and state-of-the-art dynamic fusion methods, exhibiting strong effectiveness, broad applicability, and robust generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
Traditional multimodal methods often assume static modality quality, which limits their adaptability in dynamic real-world scenarios. Thus, dynamical multimodal methods are proposed to assess modality quality and adjust their contribution accordingly. However, they typically rely on empirical metrics, failing to measure the modality quality when noise levels are extremely low or high. Moreover, existing methods usually assume that the initial contribution of each modality is the same, neglecting the intrinsic modality dependency bias. As a result, the modality hard to learn would be doubly penalized, and the performance of dynamical fusion could be inferior to that of static fusion. To address these challenges, we propose the Unbiased Dynamic Multimodal Learning (UDML) framework. Specifically, we introduce a noise-aware uncertainty estimator that adds controlled noise to the modality data and predicts its intensity from the modality feature. This forces the model to learn a clear correspondence between feature corruption and noise level, allowing accurate uncertainty measure across both low- and high-noise conditions. Furthermore, we quantify the inherent modality reliance bias within multimodal networks via modality dropout and incorporate it into the weighting mechanism. This eliminates the dual suppression effect on the hard-to-learn modality. Extensive experiments across diverse multimodal benchmark tasks validate the effectiveness, versatility, and generalizability of the proposed UDML. The code is available at https://github.com/shicaiwei123/UDML.
Problem

Research questions and friction points this paper is trying to address.

dynamic multimodal fusion
modality quality assessment
modality bias
noise robustness
uncertainty estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unbiased Dynamic Fusion
Noise-aware Uncertainty Estimation
Modality Dependency Bias
Controlled Noise Injection
Modality Dropout
๐Ÿ”Ž Similar Papers
No similar papers found.
Shicai Wei
Shicai Wei
University of Electronic Science and Technology of China, UESTC
multimodal learning
K
Kaijie Zhang
University of Electronic Science and Technology of China
L
Luyi Chen
University of Electronic Science and Technology of China
Tao He
Tao He
UESTC
Image RetrievalComputer Vision
G
Guiduo Duan
University of Electronic Science and Technology of China