Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenging zero-shot universal tumor segmentation problem across anatomical regions and multimodal medical imaging, without requiring lesion-class-specific training. Methodologically: (1) it freezes a pre-trained medical diffusion model to leverage its robust representation capacity; (2) it introduces a text-driven, anomaly-aware open-vocabulary attention mechanism to achieve anatomy–semantics alignment; and (3) it incorporates latent-space pseudo-healthy reconstruction coupled with pixel- and feature-level dual residual learning to enhance anomaly localization and boundary precision. Key contributions include the first open-vocabulary anomaly-aware attention, implicit inpainting driven by a frozen diffusion model, and multi-granularity residual modeling. The framework achieves state-of-the-art performance across four public datasets and seven tumor types under zero-shot settings, with significant average Dice score improvements. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
We explore Generalizable Tumor Segmentation, aiming to train a single model for zero-shot tumor segmentation across diverse anatomical regions. Existing methods face limitations related to segmentation quality, scalability, and the range of applicable imaging modalities. In this paper, we uncover the potential of the internal representations within frozen medical foundation diffusion models as highly efficient zero-shot learners for tumor segmentation by introducing a novel framework named DiffuGTS. DiffuGTS creates anomaly-aware open-vocabulary attention maps based on text prompts to enable generalizable anomaly segmentation without being restricted by a predefined training category list. To further improve and refine anomaly segmentation masks, DiffuGTS leverages the diffusion model, transforming pathological regions into high-quality pseudo-healthy counterparts through latent space inpainting, and applies a novel pixel-level and feature-level residual learning approach, resulting in segmentation masks with significantly enhanced quality and generalization. Comprehensive experiments on four datasets and seven tumor categories demonstrate the superior performance of our method, surpassing current state-of-the-art models across multiple zero-shot settings. Codes are available at https://github.com/Yankai96/DiffuGTS.
Problem

Research questions and friction points this paper is trying to address.

Enabling zero-shot tumor segmentation across diverse anatomical regions
Overcoming limitations in segmentation quality and scalability
Expanding applicable imaging modalities for tumor detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen medical foundation diffusion models
Creates anomaly-aware open-vocabulary attention maps
Applies pixel-level and feature-level residual learning
🔎 Similar Papers
No similar papers found.
Y
Yankai Jiang
Shanghai AI Laboratory
P
Peng Zhang
Zhejiang University
D
Donglin Yang
The University of British Columbia
Y
Yuan Tian
Shanghai AI Laboratory
Hai Lin
Hai Lin
Electrical Engineering, University of Notre Dame
Cyber-Physical SystemsHybrid Dynamical SystemsDistributed Cooperative Systems
Xiaosong Wang
Xiaosong Wang
Shanghai AI Laboratory
Medical Image AnalysisComputer VisionVision and Language