Pathryoshka: Compressing Pathology Foundation Models via Multi-Teacher Knowledge Distillation with Nested Embeddings

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pathological foundation models (FMs) achieve strong performance but suffer from excessive parameter counts (>1B) and high-dimensional embeddings, hindering research exploration and clinical deployment in resource-constrained settings. To address this, we propose a compression framework for pathological FMs that integrates multi-teacher knowledge distillation and nested representation learning. The former leverages heterogeneous teacher models to collaboratively supervise student training, enhancing knowledge transfer fidelity; the latter constructs a hierarchical, dimension-flexible embedding space to jointly optimize model compactness and downstream task adaptability. Experiments demonstrate that compressed models reduce size by 86–92%, achieve a median accuracy gain of +7.0% across ten public benchmarks, match the performance of original large-scale models, significantly outperform single-teacher distillation baselines, and support customizable embedding dimensions and efficient local deployment.

Technology Category

Application Category

📝 Abstract
Pathology foundation models (FMs) have driven significant progress in computational pathology. However, these high-performing models can easily exceed a billion parameters and produce high-dimensional embeddings, thus limiting their applicability for research or clinical use when computing resources are tight. Here, we introduce Pathryoshka, a multi-teacher distillation framework inspired by RADIO distillation and Matryoshka Representation Learning to reduce pathology FM sizes while allowing for adaptable embedding dimensions. We evaluate our framework with a distilled model on ten public pathology benchmarks with varying downstream tasks. Compared to its much larger teachers, Pathryoshka reduces the model size by 86-92% at on-par performance. It outperforms state-of-the-art single-teacher distillation models of comparable size by a median margin of 7.0 in accuracy. By enabling efficient local deployment without sacrificing accuracy or representational richness, Pathryoshka democratizes access to state-of-the-art pathology FMs for the broader research and clinical community.
Problem

Research questions and friction points this paper is trying to address.

Compressing large pathology foundation models to reduce computational requirements
Enabling adaptable embedding dimensions while maintaining model performance
Democratizing access to pathology AI models for resource-constrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-teacher knowledge distillation for model compression
Nested embeddings for adaptable representation dimensions
Reduces model size by 86-92% while maintaining performance
🔎 Similar Papers
No similar papers found.
C
Christian Grashei
Technical University of Munich, Helmholtz Munich, Munich Data Science Institute, Munich Center for Machine Learning
C
Christian Brechenmacher
Helmholtz Munich
R
Rao Muhammad Umer
Helmholtz Munich
J
Jingsong Liu
Technical University of Munich, Munich Center for Machine Learning
Carsten Marr
Carsten Marr
Institute of AI for Health @ Helmholtz Munich & Clinics @ LMU München
AI for Biomed & Health
Ewa Szczurek
Ewa Szczurek
Associate Professor at University of Warsaw / Institute AI for Health, Helmholtz Zentrum München
computational biologymachine learningartificial intelligence
P
Peter J. Schüffler
Technical University of Munich, Munich Data Science Institute, Munich Center for Machine Learning