🤖 AI Summary
To address subgroup performance disparity in vision-language models (VLMs) for glaucoma detection—caused by latent demographic biases—this paper proposes an unsupervised debiasing method that requires no protected attribute labels. First, unsupervised clustering is performed on image embeddings to infer proxy subgroups. Second, a dual-contrastive learning framework is introduced, integrating CLIP-style cross-modal contrastive learning with SimCLR-style intra-image contrastive learning; a gradient-similarity weighting mechanism dynamically adjusts the joint loss weights for the top-k hard samples. This adaptive strategy strengthens discriminative boundaries across subgroups, thereby mitigating performance gaps. Evaluated on the Harvard FairVLMed dataset, the method significantly improves both Equalized Subgroup AUC and Groupwise AUC, yielding fairer and more robust glaucoma screening without access to sensitive demographic annotations.
📝 Abstract
Vision-Language Models (VLMs) have achieved remarkable success on multimodal tasks such as image-text retrieval and zero-shot classification, yet they can exhibit demographic biases even when explicit protected attributes are absent during training. In this work, we focus on automated glaucoma screening from retinal fundus images, a critical application given that glaucoma is a leading cause of irreversible blindness and disproportionately affects underserved populations. Building on a reweighting-based contrastive learning framework, we introduce an attribute-agnostic debiasing method that (i) infers proxy subgroups via unsupervised clustering of image-image embeddings, (ii) computes gradient-similarity weights between the CLIP-style multimodal loss and a SimCLR-style image-pair contrastive loss, and (iii) applies these weights in a joint, top-$k$ weighted objective to upweight underperforming clusters. This label-free approach adaptively targets the hardest examples, thereby reducing subgroup disparities. We evaluate our method on the Harvard FairVLMed glaucoma subset, reporting Equalized Odds Distance (EOD), Equalized Subgroup AUC (ES AUC), and Groupwise AUC to demonstrate equitable performance across inferred demographic subgroups.