Multimodal Negative Learning

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal learning, dominant modalities often suppress weaker ones, leading to loss of modality-specific information. Conventional alignment methods—enforcing semantic equivalence (“learning the same”)—tend to cause overfitting and information suppression. To address this, we propose a novel “becoming-not” negative learning paradigm: dynamically leveraging the dominant modality to guide the weaker modality in suppressing non-target-class responses, thereby stabilizing decision boundaries without enforcing explicit semantic alignment and preserving modality uniqueness. We establish, for the first time, a theoretical framework for multimodal negative learning from a robustness perspective, deriving a joint optimization objective over empirical error and confidence. We further design a dynamic negative-guidance mechanism. Extensive experiments on multiple benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches under modality imbalance and noisy conditions, while exhibiting superior generalization and robustness.

Technology Category

Application Category

📝 Abstract
Multimodal learning systems often encounter challenges related to modality imbalance, where a dominant modality may overshadow others, thereby hindering the learning of weak modalities. Conventional approaches often force weak modalities to align with dominant ones in "Learning to be (the same)" (Positive Learning), which risks suppressing the unique information inherent in the weak modalities. To address this challenge, we offer a new learning paradigm: "Learning Not to be" (Negative Learning). Instead of enhancing weak modalities' target-class predictions, the dominant modalities dynamically guide the weak modality to suppress non-target classes. This stabilizes the decision space and preserves modality-specific information, allowing weak modalities to preserve unique information without being over-aligned. We proceed to reveal multimodal learning from a robustness perspective and theoretically derive the Multimodal Negative Learning (MNL) framework, which introduces a dynamic guidance mechanism tailored for negative learning. Our method provably tightens the robustness lower bound of multimodal learning by increasing the Unimodal Confidence Margin (UCoM) and reduces the empirical error of weak modalities, particularly under noisy and imbalanced scenarios. Extensive experiments across multiple benchmarks demonstrate the effectiveness and generalizability of our approach against competing methods. The code will be available at https://github.com/BaoquanGong/Multimodal-Negative-Learning.git.
Problem

Research questions and friction points this paper is trying to address.

Addressing modality imbalance in multimodal learning systems
Preserving unique information in weak modalities against over-alignment
Enhancing robustness under noisy and imbalanced data scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces dynamic guidance mechanism for negative learning
Uses dominant modalities to suppress non-target classes
Increases Unimodal Confidence Margin to tighten robustness bound
🔎 Similar Papers
No similar papers found.
B
Baoquan Gong
School of Artificial Intelligence, Tianjin University, Tianjin, China
X
Xiyuan Gao
School of Artificial Intelligence, Tianjin University, Tianjin, China
P
Pengfei Zhu
School of Artificial Intelligence, Tianjin University, Tianjin, China
Qinghua Hu
Qinghua Hu
Professor of Computer Science, Tianjin University
Machine learningData Mining
Bing Cao
Bing Cao
Professor, College of Intelligence and Computing , Tianjin University
Multimodal LearningComputer Vision