🤖 AI Summary
In computational pathology, domain shifts arising from variations in staining protocols and scanning devices severely hinder cross-center generalization of deep learning models. To address this, we propose a vision-language model (VLM)-based knowledge distillation framework, leveraging the pathology-pretrained PLIP model as the teacher. Our key innovation is the first introduction of a **domain-invariant continuous prompt tuning mechanism**: domain-specific prompt embeddings are learned separately across centers and then averaged token-wise, enabling class-agnostic, semantics-free prompt learning without manual textual annotations. This approach eliminates reliance on domain-specific prior knowledge inherent in discrete prompting, thereby enhancing zero-shot transfer robustness. Evaluated on multiple multi-center histopathology benchmarks, our method consistently outperforms existing state-of-the-art methods, achieving average F1-score gains of 3.2–5.7 percentage points. The framework provides a scalable, plug-and-play solution for domain generalization in heterogeneous clinical settings.
📝 Abstract
Domain generalization is critical in computational pathology (CPath) due to inherent domain shifts caused by variations in staining protocols, scanner devices, and imaging settings across clinical centers. Vision-language models (VLMs), such as PLIP-a pathology-tuned CLIP-trained on image-text pairs across diverse domains, serve as strong knowledge distillation sources. However, their zero-shot performance with predefined prompts remains limited due to sensitivity to prompt variations. Moreover, unlike natural images, histopathology centers lack semantic descriptors (e.g., 'sketch'), making it difficult to define domain-specific prompts for clinical centers. This requires a data-driven approach for learning domain-specific and ultimately class-generic continuous prompts. We propose Domain Invariant Prompt Tuning (DIPT) for knowledge distillation process, a novel step that learns multiple input tokens for each domain. These tokens are trained separately for each domain and are averaged across domains, leading to domain-invariant prompts. Our student model then distills knowledge from PLIP's text encoder by leveraging the prompts learned by DIPT. This leads to alignment of visual features with domain-invariant embeddings, enhancing generalization by training on multiple domains. Our method adds a significant improvement in average F1-score to existing state-of-the-art (SOTA) knowledge distillation approaches in domain generalization with histopathology datasets. This work helps the way of deploying robust CPath models in real-world clinical problems with heterogeneous data sources.