Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Representation learning faces the fundamental challenge of jointly optimizing generative fidelity and discriminative performance. This paper proposes the contrastive Mutual Information Machine (cMIM), a robust representation learning framework that eliminates the need for positive-sample augmentation and exhibits low batch-size dependency. cMIM unifies generative reconstruction and global discriminative structure through probabilistic modeling by jointly optimizing mutual information maximization and contrastive learning objectives. Furthermore, it introduces information embedding—a technique that directly enhances the discriminative capability of encoder-decoder architectures without incurring additional training overhead. Evaluated across multiple vision and molecular benchmarks, cMIM consistently outperforms both Masked Image Modeling (MIM) and InfoNCE, achieving substantial gains in classification and regression tasks while preserving high-fidelity reconstructions. These results empirically validate cMIM’s capacity for unified multi-task representation learning.

Technology Category

Application Category

📝 Abstract

Learning representations that transfer well to diverse downstream tasks remains a central challenge in representation learning. Existing paradigms -- contrastive learning, self-supervised masking, and denoising auto-encoders -- balance this challenge with different trade-offs. We introduce the {contrastive Mutual Information Machine} (cMIM), a probabilistic framework that extends the Mutual Information Machine (MIM) with a contrastive objective. While MIM maximizes mutual information between inputs and latents and promotes clustering of codes, it falls short on discriminative tasks. cMIM addresses this gap by imposing global discriminative structure while retaining MIM's generative fidelity. Our contributions are threefold. First, we propose cMIM, a contrastive extension of MIM that removes the need for positive data augmentation and is substantially less sensitive to batch size than InfoNCE. Second, we introduce {informative embeddings}, a general technique for extracting enriched features from encoder-decoder models that boosts discriminative performance without additional training and applies broadly beyond MIM. Third, we provide empirical evidence across vision and molecular benchmarks showing that cMIM outperforms MIM and InfoNCE on classification and regression tasks while preserving competitive reconstruction quality. These results position cMIM as a unified framework for representation learning, advancing the goal of models that serve both discriminative and generative applications effectively.

Problem

Research questions and friction points this paper is trying to address.

Enhancing representation transferability across diverse downstream tasks

Addressing discriminative task limitations in mutual information frameworks

Developing robust representations without positive-pair augmentation requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive extension of MIM framework

Removes need for positive data augmentation

Introduces informative embeddings for enriched features

🔎 Similar Papers

What to align in multimodal contrastive learning?

2024-09-11arXiv.orgCitations: 1

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification

2024-09-26Trans. Mach. Learn. Res.Citations: 0