All Centers Are at most a Few Tokens Apart: Knowledge Distillation with Domain Invariant Prompt Tuning

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In computational pathology, domain shifts arising from variations in staining protocols and scanning devices severely hinder cross-center generalization of deep learning models. To address this, we propose a vision-language model (VLM)-based knowledge distillation framework, leveraging the pathology-pretrained PLIP model as the teacher. Our key innovation is the first introduction of a **domain-invariant continuous prompt tuning mechanism**: domain-specific prompt embeddings are learned separately across centers and then averaged token-wise, enabling class-agnostic, semantics-free prompt learning without manual textual annotations. This approach eliminates reliance on domain-specific prior knowledge inherent in discrete prompting, thereby enhancing zero-shot transfer robustness. Evaluated on multiple multi-center histopathology benchmarks, our method consistently outperforms existing state-of-the-art methods, achieving average F1-score gains of 3.2–5.7 percentage points. The framework provides a scalable, plug-and-play solution for domain generalization in heterogeneous clinical settings.

Technology Category

Application Category

📝 Abstract
Domain generalization is critical in computational pathology (CPath) due to inherent domain shifts caused by variations in staining protocols, scanner devices, and imaging settings across clinical centers. Vision-language models (VLMs), such as PLIP-a pathology-tuned CLIP-trained on image-text pairs across diverse domains, serve as strong knowledge distillation sources. However, their zero-shot performance with predefined prompts remains limited due to sensitivity to prompt variations. Moreover, unlike natural images, histopathology centers lack semantic descriptors (e.g., 'sketch'), making it difficult to define domain-specific prompts for clinical centers. This requires a data-driven approach for learning domain-specific and ultimately class-generic continuous prompts. We propose Domain Invariant Prompt Tuning (DIPT) for knowledge distillation process, a novel step that learns multiple input tokens for each domain. These tokens are trained separately for each domain and are averaged across domains, leading to domain-invariant prompts. Our student model then distills knowledge from PLIP's text encoder by leveraging the prompts learned by DIPT. This leads to alignment of visual features with domain-invariant embeddings, enhancing generalization by training on multiple domains. Our method adds a significant improvement in average F1-score to existing state-of-the-art (SOTA) knowledge distillation approaches in domain generalization with histopathology datasets. This work helps the way of deploying robust CPath models in real-world clinical problems with heterogeneous data sources.
Problem

Research questions and friction points this paper is trying to address.

Addresses domain shifts in computational pathology from staining and scanner variations.
Improves vision-language model generalization by learning domain-invariant prompts via distillation.
Enhances histopathology model robustness for real-world clinical data heterogeneity.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific continuous prompts learned via DIPT
Averaging tokens across domains for invariance
Knowledge distillation from PLIP using domain-invariant prompts
🔎 Similar Papers
No similar papers found.
A
Amir Mohammad Ezzati
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
A
Alireza Malekhosseini
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
A
Armin Khosravi
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Mohammad Hossein Rohban
Mohammad Hossein Rohban
Associate Professor in Computer Engineering, Sharif University of Technology
Machine LearningStatisticsComputational Biology