K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Current medical image segmentation models are highly fragmented, hindering the integration of anatomical priors, exemplar-based reasoning, and interactive optimization—multiple heterogeneous knowledge sources. To address this, we propose the first general-purpose medical image segmentation framework that unifies semantic priors, contextual exemplars, and interactive feedback into a single coherent modeling paradigm. Methodologically, we introduce a dual-prompt representation—comprising 1D sparse and 2D dense prompts—and a Mixture-of-Experts (MoE) decoder with dynamic routing, enabling joint encoding and adaptive switching among heterogeneous knowledge modalities. The framework achieves unified segmentation across imaging modalities (CT, MRI, X-ray, pathology, ultrasound), anatomical structures, and clinical tasks. Evaluated on 18 public benchmarks, it establishes new state-of-the-art performance, demonstrating superior generalizability and enhanced clinical adaptability.

Technology Category

Application Category

📝 Abstract

Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $ extbf{K-Prism}$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $ extit{semantic priors}$ learned from annotated datasets, (ii) $ extit{in-context knowledge}$ from few-shot reference examples, and (iii) $ extit{interactive feedback}$ from user inputs like clicks or scribbles. Our key insight is that these heterogeneous knowledge sources can be encoded into a dual-prompt representation: 1-D sparse prompts defining $ extit{what}$ to segment and 2-D dense prompts indicating $ extit{where}$ to attend, which are then dynamically routed through a Mixture-of-Experts (MoE) decoder. This design enables flexible switching between paradigms and joint training across diverse tasks without architectural modifications. Comprehensive experiments on 18 public datasets spanning diverse modalities (CT, MRI, X-ray, pathology, ultrasound, etc.) demonstrate that K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation settings. Code will be released upon publication.

Problem

Research questions and friction points this paper is trying to address.

Unifying fragmented medical image segmentation models across tasks

Integrating semantic priors, in-context examples, and interactive feedback

Enabling flexible knowledge switching through dual-prompt representation routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates semantic priors, in-context knowledge, and interactive feedback

Encodes knowledge into dual-prompt representation for segmentation

Uses Mixture-of-Experts decoder for flexible paradigm switching

🔎 Similar Papers

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts