Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal joint training, dominant modalities overpower backpropagation, causing optimization imbalance: (i) weakening coupling between late-stage representations and outputs while accumulating redundant information; and (ii) existing gradient regulation methods neglect inter-modal semantic correlations and directional dependencies. To address this, we propose Adaptive Redundancy Control (ARC), a semantic-aware gradient regulation framework. ARC introduces a redundancy-phase monitoring mechanism grounded in the information bottleneck principle and employs a co-information gating module to dynamically assess cross-modal semantic contributions. Crucially, it applies orthogonal gradient suppression *only* to the dominant modality when redundancy exceeds a threshold—preserving unimodal discriminative signals without uniform scaling. Its core innovation lies in directionally constrained, semantics-preserving gradient modulation. ARC achieves significant improvements over state-of-the-art methods across multiple benchmarks; ablation studies validate the efficacy of each component; and the code is publicly available.

Technology Category

Application Category

📝 Abstract
Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multimodal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. To address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multimodal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git
Problem

Research questions and friction points this paper is trying to address.

Addresses modality bias causing imbalanced optimization in multimodal learning
Reduces redundant information accumulation from dominant modality during training
Adaptively regulates gradients considering cross-modal semantics and redundancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Redundancy phase monitor triggers adaptive intervention
Co-information gating preserves modality-specific information
Orthogonal gradient projection suppresses redundant information
🔎 Similar Papers
No similar papers found.
Z
Zhe Yang
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Zhengzhou Research Institute, Zhengzhou 450000, China
Wenrui Li
Wenrui Li
Assistant Professor, University of Connecticut
StatisticsNetwork scienceBiostatistics
H
Hongtao Chen
School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
P
Penghong Wang
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Suzhou Research Institute, Suzhou 215104, China
Ruiqin Xiong
Ruiqin Xiong
Peking University
video codingimage and video processing
Xiaopeng Fan
Xiaopeng Fan
Professor, Harbin Institute of Technology
Video/ImageWireless