FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient robustness in multimodal representation learning, existing methods typically rely on static or heuristic noise injection, neglecting the dynamic evolution of feature distributions. This paper proposes FANoise, an adaptive noise injection framework grounded in dual perspectives—gradient dynamics and feature distribution statistics. Its core innovation is a singular-value-adaptive mechanism that dynamically modulates noise intensity according to the spectral properties of encoder features, thereby enhancing regularization while preserving training stability. Integrated within the InfoNCE contrastive learning framework, FANoise enables data-driven noise modulation. Extensive experiments across multiple vision-language models demonstrate that FANoise significantly improves generalization performance on cross-modal retrieval and understanding tasks. The method exhibits strong cross-architectural applicability and offers theoretical interpretability through its principled, spectrum-aware design.

Technology Category

Application Category

📝 Abstract
Representation learning is fundamental to modern machine learning, powering applications such as text retrieval and multimodal understanding. However, learning robust and generalizable representations remains challenging. While prior work has demonstrated that active noise injection, a form of data augmentation, can enhance encoding performance, most existing methods rely on heuristic or static noise, overlooking the dynamic nature of feature distributions during training. In this work, we systematically study the role of noise in representation learning from both gradient-based and feature distribution perspectives, using InfoNCE loss as a representative example. Focusing on multimodal representation learning, we propose FANoise, a novel feature-adaptive noise injection strategy. By leveraging the dynamics of contrastive learning, FANoise effectively mitigates the negative impacts of noise while preserving its benefits. Under this theoretically grounded framework, comprehensive experiments demonstrate that FANoise consistently improves overall performance on multimodal tasks across various base VLM models.
Problem

Research questions and friction points this paper is trying to address.

Develops adaptive noise injection for robust multimodal representation learning
Addresses static noise limitations in dynamic feature distributions during training
Enhances performance across various vision-language models with theoretical framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-adaptive noise injection strategy
Leverages contrastive learning dynamics
Improves multimodal task performance
🔎 Similar Papers
Jiaoyang Li
Jiaoyang Li
Assistant Professor at Robotics Institute, Carnegie Mellon University
Artificial IntelligenceMulti-Agent/Robot SystemsHeuristic SearchAutomated Planning
J
Jun Fang
JD, Retail, Beijing, China
T
Tianhao Gao
JD, Retail, Beijing, China
X
Xiaohui Zhang
JD, Retail, Beijing, China
Z
Zhiyuan Liu
JD, Retail, Beijing, China
C
Chao Liu
JD, Retail, Beijing, China
P
Pengzhang Liu
JD, Retail, Beijing, China
Q
Qixia Jiang
JD, Retail, Beijing, China