Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation caused by inter-modal learning imbalance in multimodal learning, this paper formulates modality imbalance as a multi-objective optimization problem for the first time and proposes a gradient-driven, efficient optimization algorithm. Methodologically, we design a lightweight optimization framework with theoretical convergence guarantees, eliminating costly subroutine calls inherent in conventional multi-objective methods—reducing computational time by up to 20×. Innovatively integrating multi-objective optimization principles with multimodal feature alignment, our approach achieves dynamic inter-modal learning balance without increasing model complexity. Extensive experiments on mainstream multimodal benchmarks—including MM-IMDB and CMU-MOSEI—demonstrate consistent superiority over existing balancing-learning and multi-objective optimization baselines, validating both effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. We provide convergence guarantees for the proposed method, and empirical evaluations on popular MML benchmarks showcasing the improved performance of the proposed method over existing balanced MML and MOO baselines, with up to ~20x reduction in subroutine computation time. Our code is available at https://github.com/heshandevaka/MIMO.
Problem

Research questions and friction points this paper is trying to address.

Addresses modality imbalance in multi-modal learning
Proposes multi-objective optimization for balanced learning
Reduces computation time while improving performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulating multi-modal learning as multi-objective optimization
Using gradient-based algorithm to solve modality imbalance
Achieving significant computation time reduction in subroutines
🔎 Similar Papers
No similar papers found.