MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

In multimodal learning, models often over-rely on dominant modalities, leading to underutilization of weak modalities and degraded generalization. To address this modality imbalance, we propose a semantic-inconsistency-driven data augmentation framework. First, cross-modal misaligned samples are generated based on unimodal confidence scores. Second, a dynamic weighting mechanism jointly optimizes both the contribution weights of weak modalities and the sampling weights for hard examples. Third, a feature-similarity-guided hard-example prioritization strategy is introduced to enhance discriminative learning. The method requires no additional annotations and effectively mitigates modality bias. It significantly improves model robustness against ambiguous or noisy inputs. Evaluated on major multimodal classification benchmarks—including MM-IMDB and CMU-MOSEI—our approach achieves state-of-the-art performance, demonstrating its effectiveness in balancing modality contributions and strengthening weak-modality representations.

Technology Category

Application Category

📝 Abstract

Multimodal models often over-rely on dominant modalities, failing to achieve optimal performance. While prior work focuses on modifying training objectives or optimization procedures, data-centric solutions remain underexplored. We propose MIDAS, a novel data augmentation strategy that generates misaligned samples with semantically inconsistent cross-modal information, labeled using unimodal confidence scores to compel learning from contradictory signals. However, this confidence-based labeling can still favor the more confident modality. To address this within our misaligned samples, we introduce weak-modality weighting, which dynamically increases the loss weight of the least confident modality, thereby helping the model fully utilize weaker modality. Furthermore, when misaligned features exhibit greater similarity to the aligned features, these misaligned samples pose a greater challenge, thereby enabling the model to better distinguish between classes. To leverage this, we propose hard-sample weighting, which prioritizes such semantically ambiguous misaligned samples. Experiments on multiple multimodal classification benchmarks demonstrate that MIDAS significantly outperforms related baselines in addressing modality imbalance.

Problem

Research questions and friction points this paper is trying to address.

Addresses multimodal models' over-reliance on dominant modalities

Generates misaligned samples with contradictory cross-modal information

Introduces weighting strategies for weak modalities and hard samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates misaligned samples with inconsistent cross-modal information

Introduces weak-modality weighting to boost less confident modality

Prioritizes hard misaligned samples through semantic ambiguity weighting

🔎 Similar Papers

A Comprehensive Survey on Data Augmentation