Unleashing Diffusion and State Space Models for Medical Image Segmentation

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Medical image segmentation models exhibit limited generalization to unseen organs or novel tumor types, hindering clinical deployment. To address cross-organ and cross-modal segmentation of rare or zero-shot tumors, we propose DSM—a novel framework integrating diffusion models with state-space models (Mamba) for the first time. Our key contributions are: (1) a dual-query mechanism—where organ queries encode anatomical priors and diffusion-guided tumor queries dynamically generate representations for unseen categories; (2) synergistic alignment of CLIP text embeddings and diffusion features to enable semantic-driven zero-shot segmentation; and (3) object-aware feature grouping coupled with an enhanced attention-based decoder to improve multi-label robustness. DSM achieves significant improvements over state-of-the-art methods across multiple tumor segmentation benchmarks. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Existing segmentation models trained on a single medical imaging dataset often lack robustness when encountering unseen organs or tumors. Developing a robust model capable of identifying rare or novel tumor categories not present during training is crucial for advancing medical imaging applications. We propose DSM, a novel framework that leverages diffusion and state space models to segment unseen tumor categories beyond the training data. DSM utilizes two sets of object queries trained within modified attention decoders to enhance classification accuracy. Initially, the model learns organ queries using an object-aware feature grouping strategy to capture organ-level visual features. It then refines tumor queries by focusing on diffusion-based visual prompts, enabling precise segmentation of previously unseen tumors. Furthermore, we incorporate diffusion-guided feature fusion to improve semantic segmentation performance. By integrating CLIP text embeddings, DSM captures category-sensitive classes to improve linguistic transfer knowledge, thereby enhancing the model's robustness across diverse scenarios and multi-label tasks. Extensive experiments demonstrate the superior performance of DSM in various tumor segmentation tasks. Code is available at https://github.com/Rows21/KMax-Mamba.

Problem

Research questions and friction points this paper is trying to address.

Enhancing robustness for unseen organ and tumor segmentation

Leveraging diffusion models to segment rare tumor categories

Improving semantic segmentation with CLIP text embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diffusion and state space models

Uses object queries for classification accuracy

Integrates CLIP text embeddings for robustness

🔎 Similar Papers

No similar papers found.