Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical image segmentation methods suffer from limited multi-scale feature modeling capacity—particularly in capturing cross-scale dependencies—leading to suboptimal performance in complex anatomical regions. To address this, we propose AR-Seg, an autoregressive multi-scale mask prediction framework. AR-Seg introduces the first multi-scale mask autoencoder, which explicitly models full upstream-scale dependencies by hierarchically predicting the mask at the next finer scale. It further incorporates hierarchical feature disentanglement and multi-sampling consensus aggregation to enhance robustness. The framework enables coarse-to-fine interpretable segmentation with full intermediate process visualization. Evaluated on a dual-modality benchmark dataset, AR-Seg achieves significant improvements over state-of-the-art methods, especially in segmenting intricate anatomical structures, with substantial gains in segmentation accuracy.

Technology Category

Application Category

📝 Abstract
While deep learning has significantly advanced medical image segmentation, most existing methods still struggle with handling complex anatomical regions. Cascaded or deep supervision-based approaches attempt to address this challenge through multi-scale feature learning but fail to establish sufficient inter-scale dependencies, as each scale relies solely on the features of the immediate predecessor. To this end, we propose the AutoRegressive Segmentation framework via next-scale mask prediction, termed AR-Seg, which progressively predicts the next-scale mask by explicitly modeling dependencies across all previous scales within a unified architecture. AR-Seg introduces three innovations: (1) a multi-scale mask autoencoder that quantizes the mask into multi-scale token maps to capture hierarchical anatomical structures, (2) a next-scale autoregressive mechanism that progressively predicts next-scale masks to enable sufficient inter-scale dependencies, and (3) a consensus-aggregation strategy that combines multiple sampled results to generate a more accurate mask, further improving segmentation robustness. Extensive experimental results on two benchmark datasets with different modalities demonstrate that AR-Seg outperforms state-of-the-art methods while explicitly visualizing the intermediate coarse-to-fine segmentation process.
Problem

Research questions and friction points this paper is trying to address.

Handles complex anatomical regions in medical image segmentation
Establishes inter-scale dependencies across multi-scale features
Improves segmentation robustness and accuracy through consensus-aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale mask autoencoder captures hierarchical structures
Next-scale autoregressive mechanism predicts subsequent masks
Consensus-aggregation strategy enhances segmentation robustness
🔎 Similar Papers
No similar papers found.
T
Tao Chen
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University
Chenhui Wang
Chenhui Wang
PhD Candidate, Fudan University
AI for NeuroscienceComputer Vision
Z
Zhihao Chen
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University
Hongming Shan
Hongming Shan
Fudan University; Rensselaer Polytechnic institute
Machine LearningMedical ImagingComputer Vision