Improving Multimodal Learning via Imbalanced Learning

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal learning, conventional gradient balancing methods assume uniform optimization across modalities; however, this work argues that “balance” is suboptimal—uneven optimization better accommodates inherent inter-modal differences in prediction bias and variance. To address this, we propose Asymmetric Representation Learning (ARL), a general, parameter-free, and architecture-agnostic strategy. ARL leverages unimodal bias–variance decomposition to dynamically estimate and jointly optimize modality-specific contribution weights via an auxiliary regularizer. This is the first theoretical analysis rigorously establishing the superiority of uneven over balanced learning from a bias–variance trade-off perspective. Empirically, ARL consistently improves performance across multiple benchmark datasets, while remaining compatible with diverse fusion mechanisms and network architectures—demonstrating strong generalizability and practical applicability.

Technology Category

Application Category

📝 Abstract
Multimodal learning often encounters the under-optimized problem and may perform worse than unimodal learning. Existing approaches attribute this issue to imbalanced learning across modalities and tend to address it through gradient balancing. However, this paper argues that balanced learning is not the optimal setting for multimodal learning. With bias-variance analysis, we prove that imbalanced dependency on each modality obeying the inverse ratio of their variances contributes to optimal performance. To this end, we propose the Asymmetric Representation Learning(ARL) strategy to assist multimodal learning via imbalanced optimization. ARL introduces auxiliary regularizers for each modality encoder to calculate their prediction variance. ARL then calculates coefficients via the unimodal variance to re-weight the optimization of each modality, forcing the modality dependence ratio to be inversely proportional to the modality variance ratio. Moreover, to minimize the generalization error, ARL further introduces the prediction bias of each modality and jointly optimizes them with multimodal loss. Notably, all auxiliary regularizers share parameters with the multimodal model and rely only on the modality representation. Thus the proposed ARL strategy introduces no extra parameters and is independent of the structures and fusion methods of the multimodal model. Finally, extensive experiments on various datasets validate the effectiveness and versatility of ARL. Code is available at href{https://github.com/shicaiwei123/ICCV2025-ARL}{https://github.com/shicaiwei123/ICCV2025-ARL}
Problem

Research questions and friction points this paper is trying to address.

Addresses under-optimized multimodal learning performance
Proves imbalanced modality dependency boosts performance
Introduces Asymmetric Representation Learning for optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

ARL strategy for imbalanced multimodal optimization
Auxiliary regularizers calculate modality prediction variance
No extra parameters, independent of model structures
🔎 Similar Papers
No similar papers found.