Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses a critical limitation in existing large-scale multimodal contrastive learning methods: the neglect of the intrinsic spectral structure of embedding features, which concentrates semantic information in a few dominant subspaces while leaving other dimensions vulnerable to noise and spurious correlations, thereby impairing generalization. To remedy this, we propose the Spectral Disentanglement and Enhancement (SDE) framework—the first to integrate spectral analysis into contrastive learning. SDE adaptively partitions features via singular value decomposition into strong-signal, weak-signal, and noise subspaces, and employs a curriculum-based spectral augmentation strategy to amplify informative components. Furthermore, it introduces a dual-domain contrastive loss operating in both feature and spectral domains to jointly optimize representation alignment and spectral regularization. Evaluated on major multimodal benchmarks, SDE significantly outperforms state-of-the-art methods, enhancing robustness and generalization while seamlessly integrating into existing contrastive learning pipelines.

Technology Category

Application Category

📝 Abstract

Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsic spectral structure of the learned features. Empirical evidence indicates that high-dimensional embeddings tend to collapse into narrow cones, concentrating task-relevant semantics in a small subspace, while the majority of dimensions remain occupied by noise and spurious correlations. Such spectral imbalance and entanglement undermine model generalization. We propose Spectral Disentanglement and Enhancement (SDE), a novel framework that bridges the gap between the geometry of the embedded spaces and their spectral properties. Our approach leverages singular value decomposition to adaptively partition feature dimensions into strong signals that capture task-critical semantics, weak signals that reflect ancillary correlations, and noise representing irrelevant perturbations. A curriculum-based spectral enhancement strategy is then applied, selectively amplifying informative components with theoretical guarantees on training stability. Building upon the enhanced features, we further introduce a dual-domain contrastive loss that jointly optimizes alignment in both the feature and spectral spaces, effectively integrating spectral regularization into the training process and encouraging richer, more robust representations. Extensive experiments on large-scale multimodal benchmarks demonstrate that SDE consistently improves representation robustness and generalization, outperforming state-of-the-art methods. SDE integrates seamlessly with existing contrastive pipelines, offering an effective solution for multimodal representation learning.

Problem

Research questions and friction points this paper is trying to address.

spectral disentanglement

representation learning

contrastive learning

spectral imbalance

feature collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral disentanglement

contrastive learning

singular value decomposition