Counterfactual contrastive learning: robust representations via causal image synthesis

📅 2024-03-14
🏛️ DEMI@MICCAI
📈 Citations: 4
Influential: 0
📄 PDF

career value

159K/year
🤖 AI Summary
Contrastive pretraining under limited-label settings is highly sensitive to data augmentation, particularly struggling to simultaneously preserve semantic consistency and achieve domain invariance. To address this, we propose a novel positive-pair construction method grounded in approximate counterfactual reasoning—marking the first integration of causality-driven image synthesis into contrastive learning (specifically within the SimCLR framework). Our approach explicitly disentangles semantic content from domain-specific factors, enabling generation of semantically faithful and domain-decoupled positive samples, thereby overcoming inherent limitations of hand-crafted photometric transformations. The method unifies counterfactual image generation, causal inference modeling, and medical imaging representation learning. Evaluated across five chest X-ray and mammography datasets, it consistently improves performance on both in-distribution and out-of-distribution downstream tasks, with notable gains in robustness and generalization—especially for underrepresented domains.

Technology Category

Application Category

📝 Abstract
Contrastive pretraining is well-known to improve downstream task performance and model generalisation, especially in limited label settings. However, it is sensitive to the choice of augmentation pipeline. Positive pairs should preserve semantic information while destroying domain-specific information. Standard augmentation pipelines emulate domain-specific changes with pre-defined photometric transformations, but what if we could simulate realistic domain changes instead? In this work, we show how to utilise recent progress in counterfactual image generation to this effect. We propose CF-SimCLR, a counterfactual contrastive learning approach which leverages approximate counterfactual inference for positive pair creation. Comprehensive evaluation across five datasets, on chest radiography and mammography, demonstrates that CF-SimCLR substantially improves robustness to acquisition shift with higher downstream performance on both in- and out-of-distribution data, particularly for domains which are under-represented during training.
Problem

Research questions and friction points this paper is trying to address.

Enhance robustness to acquisition shift in medical imaging
Improve performance on in- and out-of-distribution data
Address under-represented domains in contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses counterfactual image generation for augmentation
Leverages causal inference for positive pairs
Improves robustness to acquisition shifts
🔎 Similar Papers