A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic guidance in constructing positive and negative sample pairs for visual contrastive learning. We propose the first unified taxonomy for pair construction, rigorously delineating the applicability boundaries and coupling mechanisms among data augmentation, semantic alignment, and hard-negative mining paradigms. Our method integrates key techniques—including self-supervised augmentation, dynamic metric learning, memory-bank sampling, and cross-modal alignment—to establish a reproducible paired-evaluation benchmark and practical implementation guide. Extensive experiments demonstrate that the framework substantially improves representation quality, accelerates training convergence, enhances downstream transfer performance, and reduces computational overhead. By unifying theoretical analysis with engineering practice, our work provides both foundational principles and actionable methodologies for data governance in contrastive learning.

Technology Category

Application Category

📝 Abstract
Visual contrastive learning aims to learn representations by contrasting similar (positive) and dissimilar (negative) pairs of data samples. The design of these pairs significantly impacts representation quality, training efficiency, and computational cost. A well-curated set of pairs leads to stronger representations and faster convergence. As contrastive pre-training sees wider adoption for solving downstream tasks, data curation becomes essential for optimizing its effectiveness. In this survey, we attempt to create a taxonomy of existing techniques for positive and negative pair curation in contrastive learning, and describe them in detail.
Problem

Research questions and friction points this paper is trying to address.

Improving representation quality in visual contrastive learning
Optimizing data pair curation for training efficiency
Taxonomy of techniques for effective contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual contrastive learning techniques
Positive and negative pair curation
Taxonomy of data curation methods