Primus: Enforcing Attention Usage for 3D Medical Image Segmentation

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current Transformer-based models for 3D medical image segmentation heavily rely on convolutional modules; in several hybrid architectures, removing the Transformer component incurs no performance degradation—revealing limited modeling efficacy of existing attention mechanisms. Method: We propose Primus, the first fully Transformer-based architecture for 3D medical segmentation: it eliminates convolutions entirely and introduces voxel-level high-resolution tokenization, enhanced 3D positional embeddings, and optimized self-attention blocks to fully harness pure attentional modeling. Contribution/Results: Extensive experiments demonstrate that Primus consistently outperforms all prior Transformer-based methods across multiple mainstream 3D medical segmentation benchmarks (e.g., AMOS, BTCV, MSD), while matching the performance of state-of-the-art convolutional models. Primus thus establishes, for the first time, a purely attention-driven paradigm in this domain and sets a new Transformer-based standard for 3D medical image segmentation.

Technology Category

Application Category

📝 Abstract
Transformers have achieved remarkable success across multiple fields, yet their impact on 3D medical image segmentation remains limited with convolutional networks still dominating major benchmarks. In this work, we a) analyze current Transformer-based segmentation models and identify critical shortcomings, particularly their over-reliance on convolutional blocks. Further, we demonstrate that in some architectures, performance is unaffected by the absence of the Transformer, thereby demonstrating their limited effectiveness. To address these challenges, we move away from hybrid architectures and b) introduce a fully Transformer-based segmentation architecture, termed Primus. Primus leverages high-resolution tokens, combined with advances in positional embeddings and block design, to maximally leverage its Transformer blocks. Through these adaptations Primus surpasses current Transformer-based methods and competes with state-of-the-art convolutional models on multiple public datasets. By doing so, we create the first pure Transformer architecture and take a significant step towards making Transformers state-of-the-art for 3D medical image segmentation.
Problem

Research questions and friction points this paper is trying to address.

Transformers in 3D medical image segmentation
Overcoming reliance on convolutional blocks
Introducing pure Transformer architecture Primus
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully Transformer-based architecture
High-resolution tokens utilization
Advanced positional embeddings design
🔎 Similar Papers
No similar papers found.