Pathryoshka: Compressing Pathology Foundation Models via Multi-Teacher Knowledge Distillation with Nested Embeddings

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Pathological foundation models (FMs) achieve strong performance but suffer from excessive parameter counts (>1B) and high-dimensional embeddings, hindering research exploration and clinical deployment in resource-constrained settings. To address this, we propose a compression framework for pathological FMs that integrates multi-teacher knowledge distillation and nested representation learning. The former leverages heterogeneous teacher models to collaboratively supervise student training, enhancing knowledge transfer fidelity; the latter constructs a hierarchical, dimension-flexible embedding space to jointly optimize model compactness and downstream task adaptability. Experiments demonstrate that compressed models reduce size by 86–92%, achieve a median accuracy gain of +7.0% across ten public benchmarks, match the performance of original large-scale models, significantly outperform single-teacher distillation baselines, and support customizable embedding dimensions and efficient local deployment.

Technology Category

Application Category

📝 Abstract

Pathology foundation models (FMs) have driven significant progress in computational pathology. However, these high-performing models can easily exceed a billion parameters and produce high-dimensional embeddings, thus limiting their applicability for research or clinical use when computing resources are tight. Here, we introduce Pathryoshka, a multi-teacher distillation framework inspired by RADIO distillation and Matryoshka Representation Learning to reduce pathology FM sizes while allowing for adaptable embedding dimensions. We evaluate our framework with a distilled model on ten public pathology benchmarks with varying downstream tasks. Compared to its much larger teachers, Pathryoshka reduces the model size by 86-92% at on-par performance. It outperforms state-of-the-art single-teacher distillation models of comparable size by a median margin of 7.0 in accuracy. By enabling efficient local deployment without sacrificing accuracy or representational richness, Pathryoshka democratizes access to state-of-the-art pathology FMs for the broader research and clinical community.

Problem

Research questions and friction points this paper is trying to address.

Compressing large pathology foundation models to reduce computational requirements

Enabling adaptable embedding dimensions while maintaining model performance

Democratizing access to pathology AI models for resource-constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-teacher knowledge distillation for model compression

Nested embeddings for adaptable representation dimensions

Reduces model size by 86-92% while maintaining performance

🔎 Similar Papers

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation