VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification

📅 2024-03-23

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Pathological image classification urgently requires alleviating dependence on labor-intensive manual annotations. To address this, we propose a fully unsupervised zero-shot pseudo-labeling framework that operates without any labeled data. First, a pre-trained vision-language model (VLM) performs zero-shot inference to generate initial pseudo-labels. Next, a dual-path consensus mechanism—operating jointly over learnable prompts and feature-space embeddings—selects high-confidence samples. Subsequently, a high-confidence cross-supervision (HCS) strategy is introduced to jointly refine pseudo-label quality and improve unlabeled data utilization in a fully unsupervised manner. Our method integrates multi-view uncertainty estimation, feature-space clustering, and noise-robust semi-supervised learning. Evaluated on the HPH and LC25K histopathology benchmarks, our approach achieves 87.1% and 95.1% classification accuracy, respectively—substantially outperforming existing zero-shot and noisy-label learning methods.

Technology Category

Application Category

📝 Abstract

Despite that deep learning methods have achieved remarkable performance in pathology image classification, they heavily rely on labeled data, demanding extensive human annotation efforts. In this study, we present a novel human annotation-free method for pathology image classification by leveraging pre-trained Vision-Language Models (VLMs). Without human annotation, pseudo labels of the training set are obtained by utilizing the zero-shot inference capabilities of VLM, which may contain a lot of noise due to the domain shift between the pre-training data and the target dataset. To address this issue, we introduce VLM-CPL, a novel approach based on consensus pseudo labels that integrates two noisy label filtering techniques with a semi-supervised learning strategy. Specifically, we first obtain prompt-based pseudo labels with uncertainty estimation by zero-shot inference with the VLM using multiple augmented views of an input. Then, by leveraging the feature representation ability of VLM, we obtain feature-based pseudo labels via sample clustering in the feature space. Prompt-feature consensus is introduced to select reliable samples based on the consensus between the two types of pseudo labels. By rejecting low-quality pseudo labels, we further propose High-confidence Cross Supervision (HCS) to learn from samples with reliable pseudo labels and the remaining unlabeled samples. Experimental results showed that our method obtained an accuracy of 87.1% and 95.1% on the HPH and LC25K datasets, respectively, and it largely outperformed existing zero-shot classification and noisy label learning methods. The code is available at https://github.com/lanfz2000/VLM-CPL.

Problem

Research questions and friction points this paper is trying to address.

Reduces human annotation in pathological image classification

Filters noisy pseudo-labels from Vision-Language Models (VLMs)

Improves accuracy in patch-level and slide-level cancer diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Vision-Language Models for pseudo-labels

Uses consensus filtering to reduce label noise

Incorporates open-set prompting for patch enhancement

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis