Rebalanced Multimodal Learning with Data-aware Unimodal Sampling

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address modality imbalance in multimodal learning caused by uneven single-modality data sampling, this paper proposes a data-aware dynamic unimodal sampling framework. Methodologically, it introduces (1) Cumulative Modality Difference (CMD), the first differentiable and monitorable metric for quantifying modality imbalance; (2) an adaptive sampling strategy jointly driven by heuristic scheduling and Proximal Policy Optimization (PPO)-based reinforcement learning, enabling end-to-end optimization of the sampling process; and (3) a plug-and-play module that seamlessly integrates with mainstream multimodal architectures. Evaluated on multiple benchmark datasets, the framework achieves an average accuracy improvement of 2.3% over state-of-the-art methods, demonstrating that regulating modality balance at the data sampling stage is critical to enhancing model performance.

Technology Category

Application Category

📝 Abstract
To address the modality learning degeneration caused by modality imbalance, existing multimodal learning~(MML) approaches primarily attempt to balance the optimization process of each modality from the perspective of model learning. However, almost all existing methods ignore the modality imbalance caused by unimodal data sampling, i.e., equal unimodal data sampling often results in discrepancies in informational content, leading to modality imbalance. Therefore, in this paper, we propose a novel MML approach called underline{D}ata-aware underline{U}nimodal underline{S}ampling~(method), which aims to dynamically alleviate the modality imbalance caused by sampling. Specifically, we first propose a novel cumulative modality discrepancy to monitor the multimodal learning process. Based on the learning status, we propose a heuristic and a reinforcement learning~(RL)-based data-aware unimodal sampling approaches to adaptively determine the quantity of sampled data at each iteration, thus alleviating the modality imbalance from the perspective of sampling. Meanwhile, our method can be seamlessly incorporated into almost all existing multimodal learning approaches as a plugin. Experiments demonstrate that method~can achieve the best performance by comparing with diverse state-of-the-art~(SOTA) baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses modality imbalance in multimodal learning
Proposes data-aware unimodal sampling to balance modalities
Enhances performance by integrating with existing MML methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic unimodal sampling to balance modalities
Cumulative modality discrepancy for learning monitoring
Reinforcement learning for adaptive data sampling
🔎 Similar Papers
No similar papers found.
Q
Qingyuan Jiang
School of Computer Science and Engineering, Nanjing University of Science and Technology
Z
Zhouyang Chi
School of Computer Science and Engineering, Nanjing University of Science and Technology
X
Xiao Ma
School of Computer Science and Technology, Zhejiang Sci-Tech University
Qirong Mao
Qirong Mao
Jiangsu University
AIMultimedia
Y
Yang Yang
School of Computer Science and Engineering, Nanjing University of Science and Technology
J
Jinhui Tang
School of Computer Science and Engineering, Nanjing University of Science and Technology