Unleashing Diffusion and State Space Models for Medical Image Segmentation

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation models exhibit limited generalization to unseen organs or novel tumor types, hindering clinical deployment. To address cross-organ and cross-modal segmentation of rare or zero-shot tumors, we propose DSM—a novel framework integrating diffusion models with state-space models (Mamba) for the first time. Our key contributions are: (1) a dual-query mechanism—where organ queries encode anatomical priors and diffusion-guided tumor queries dynamically generate representations for unseen categories; (2) synergistic alignment of CLIP text embeddings and diffusion features to enable semantic-driven zero-shot segmentation; and (3) object-aware feature grouping coupled with an enhanced attention-based decoder to improve multi-label robustness. DSM achieves significant improvements over state-of-the-art methods across multiple tumor segmentation benchmarks. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Existing segmentation models trained on a single medical imaging dataset often lack robustness when encountering unseen organs or tumors. Developing a robust model capable of identifying rare or novel tumor categories not present during training is crucial for advancing medical imaging applications. We propose DSM, a novel framework that leverages diffusion and state space models to segment unseen tumor categories beyond the training data. DSM utilizes two sets of object queries trained within modified attention decoders to enhance classification accuracy. Initially, the model learns organ queries using an object-aware feature grouping strategy to capture organ-level visual features. It then refines tumor queries by focusing on diffusion-based visual prompts, enabling precise segmentation of previously unseen tumors. Furthermore, we incorporate diffusion-guided feature fusion to improve semantic segmentation performance. By integrating CLIP text embeddings, DSM captures category-sensitive classes to improve linguistic transfer knowledge, thereby enhancing the model's robustness across diverse scenarios and multi-label tasks. Extensive experiments demonstrate the superior performance of DSM in various tumor segmentation tasks. Code is available at https://github.com/Rows21/KMax-Mamba.
Problem

Research questions and friction points this paper is trying to address.

Enhancing robustness for unseen organ and tumor segmentation
Leveraging diffusion models to segment rare tumor categories
Improving semantic segmentation with CLIP text embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diffusion and state space models
Uses object queries for classification accuracy
Integrates CLIP text embeddings for robustness
🔎 Similar Papers
No similar papers found.
Rong Wu
Rong Wu
Zhejiang University
Z
Ziqi Chen
School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China
L
Liming Zhong
School of Biomedical Engineering, Southern Medical University, Guangzhou, China
H
Heng Li
Faculty of Biomedical Engineering, Shenzhen University of Advanced Technology, Guangzhou, China
Hai Shu
Hai Shu
Department of Biostatistics, School of Global Public Health, New York University
High dimensional dataneuroimagemachine learning/deep learning