LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations

📅 2024-03-11
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Random cropping in contrastive learning often induces semantic inconsistency between dual views, degrading representation quality. To address this, we propose a novel contrastive learning paradigm that incorporates the original (uncropped) image: a dual-branch encoder processes one augmented view in one branch and the full-resolution original image in the other; an enhanced InfoNCE loss is designed to explicitly enforce consistency between local cropped views and global semantic context, thereby mitigating feature confusion caused by cropping distortion. This is the first work to integrate the original image into the instance discrimination framework without requiring additional annotations or computational overhead. In linear evaluation on ImageNet-1K, our method outperforms MoCo-v2 by 5.1%. It also achieves substantial gains over mainstream self-supervised baselines on transfer classification and object detection tasks.

Technology Category

Application Category

📝 Abstract
Contrastive instance discrimination methods outperform supervised learning in downstream tasks such as image classification and object detection. However, these methods rely heavily on data augmentation during representation learning, which can lead to suboptimal results if not implemented carefully. A common augmentation technique in contrastive learning is random cropping followed by resizing. This can degrade the quality of representation learning when the two random crops contain distinct semantic content. To tackle this issue, we introduce LeOCLR (Leveraging Original Images for Contrastive Learning of Visual Representations), a framework that employs a novel instance discrimination approach and an adapted loss function. This method prevents the loss of important semantic features caused by mapping different object parts during representation learning. Our experiments demonstrate that LeOCLR consistently improves representation learning across various datasets, outperforming baseline models. For instance, LeOCLR surpasses MoCo-v2 by 5.1% on ImageNet-1K in linear evaluation and outperforms several other methods on transfer learning and object detection tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses suboptimal results from data augmentation in contrastive learning
Prevents semantic feature loss from random cropping in representation learning
Improves visual representation learning across diverse datasets and tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses original images for contrastive learning
Novel instance discrimination approach
Adapted loss function prevents feature loss
🔎 Similar Papers
No similar papers found.
M
Mohammad Alkhalefi
Department of Computing Science, University of Aberdeen, UK
G
G. Leontidis
Department of Computing Science, University of Aberdeen, UK
Mingjun Zhong
Mingjun Zhong
Department of Computing Science, University of Aberdeen, UK
Applied StatisticsMachine Learning