Effortless Vision-Language Model Specialization in Histopathology without Annotation

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

To address the scarcity of high-quality annotated data in computational pathology, this paper proposes a fully unsupervised specialization method for vision-language models (VLMs). Leveraging domain-specific image-text pairs automatically extracted from public repositories (e.g., TCGA, Quilt), it performs continual pretraining on general-purpose VLMs—such as CONCH and QuiltNet—to enable zero-shot or few-shot adaptation without human annotation. The approach is task-agnostic and incurs zero labeling cost. Evaluated on three representative histopathological classification tasks, it substantially improves zero-shot accuracy (average +12.3%) and achieves near full-finetuning performance under 5-shot settings; gains scale consistently with pretraining data volume. The implementation is publicly available, demonstrating the method’s effectiveness, generality, and reproducibility.

Technology Category

Application Category

📝 Abstract

Recent advances in Vision-Language Models (VLMs) in histopathology, such as CONCH and QuiltNet, have demonstrated impressive zero-shot classification capabilities across various tasks. However, their general-purpose design may lead to suboptimal performance in specific downstream applications. While supervised fine-tuning methods address this issue, they require manually labeled samples for adaptation. This paper investigates annotation-free adaptation of VLMs through continued pretraining on domain- and task-relevant image-caption pairs extracted from existing databases. Our experiments on two VLMs, CONCH and QuiltNet, across three downstream tasks reveal that these pairs substantially enhance both zero-shot and few-shot performance. Notably, with larger training sizes, continued pretraining matches the performance of few-shot methods while eliminating manual labeling. Its effectiveness, task-agnostic design, and annotation-free workflow make it a promising pathway for adapting VLMs to new histopathology tasks. Code is available at https://github.com/DeepMicroscopy/Annotation-free-VLM-specialization.

Problem

Research questions and friction points this paper is trying to address.

Improves histopathology VLMs without manual annotations

Enhances zero-shot and few-shot performance via pretraining

Eliminates labeling needs while matching supervised methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Annotation-free VLM adaptation via continued pretraining

Domain-specific image-caption pairs boost performance

Task-agnostic design eliminates manual labeling needs

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis