🤖 AI Summary
This work proposes a deployment-oriented knowledge distillation framework to address the challenge of deploying high-accuracy 3D medical image segmentation models in resource-constrained clinical settings, where computational overhead often hinders practical application. The method enables systematic model compression under a fixed architecture without altering the inference pipeline, achieving substantial efficiency gains while preserving segmentation fidelity. Evaluated on multimodal medical imaging data—including brain MRI and abdominal CT—the approach reduces model parameters by 94% while retaining 98.7% of the teacher model’s segmentation performance. Furthermore, it achieves a 67% reduction in CPU inference latency, significantly enhancing clinical deployability without compromising diagnostic accuracy.
📝 Abstract
Deploying medical image segmentation models in routine clinical workflows is often constrained by on-premises infrastructure, where computational resources are fixed and cloud-based inference may be restricted by governance and security policies. While high-capacity models achieve strong segmentation accuracy, their computational demands hinder practical deployment and long-term maintainability in hospital environments. We present a deployment-oriented framework that leverages knowledge distillation to translate a high-performing segmentation model into a scalable family of compact student models, without modifying the inference pipeline. The proposed approach preserves architectural compatibility with existing clinical systems while enabling systematic capacity reduction. The framework is evaluated on a multi-site brain MRI dataset comprising 1,104 3D volumes, with independent testing on 101 curated cases, and is further examined on abdominal CT to assess cross-modality generalizability. Under aggressive parameter reduction (94%), the distilled student model preserves nearly all of the teacher's segmentation accuracy (98.7%), while achieving substantial efficiency gains, including up to a 67% reduction in CPU inference latency without additional deployment overhead. These results demonstrate that knowledge distillation provides a practical and reliable pathway for converting research-grade segmentation models into maintainable, deployment-ready components for on-premises clinical workflows in real-world health systems.