QTSeg: A Query Token-Based Dual-Mix Attention Framework with Multi-Level Feature Distribution for Medical Image Segmentation

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
Medical image segmentation faces dual challenges: limited long-range modeling capability in CNNs and prohibitively high computational overhead in Transformers, making it difficult for existing methods to balance accuracy and efficiency. To address this, we propose a novel CNN-Transformer hybrid architecture. Our key contributions are: (1) a query token-driven dual hybrid attention decoder that jointly captures fine-grained local spatial details and global semantic dependencies; and (2) a multi-level feature distribution module that adaptively regulates multi-scale feature flow between encoder and decoder. By deeply integrating cross-, spatial-, and channel-wise attention mechanisms, our method achieves state-of-the-art performance across five major medical benchmarks—lesion, polyp, breast cancer, cell, and retinal vessel segmentation—consistently outperforming prior approaches in both segmentation accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract
Medical image segmentation plays a crucial role in assisting healthcare professionals with accurate diagnoses and enabling automated diagnostic processes. Traditional convolutional neural networks (CNNs) often struggle with capturing long-range dependencies, while transformer-based architectures, despite their effectiveness, come with increased computational complexity. Recent efforts have focused on combining CNNs and transformers to balance performance and efficiency, but existing approaches still face challenges in achieving high segmentation accuracy while maintaining low computational costs. Furthermore, many methods underutilize the CNN encoder's capability to capture local spatial information, concentrating primarily on mitigating long-range dependency issues. To address these limitations, we propose QTSeg, a novel architecture for medical image segmentation that effectively integrates local and global information. QTSeg features a dual-mix attention decoder designed to enhance segmentation performance through: (1) a cross-attention mechanism for improved feature alignment, (2) a spatial attention module to capture long-range dependencies, and (3) a channel attention block to learn inter-channel relationships. Additionally, we introduce a multi-level feature distribution module, which adaptively balances feature propagation between the encoder and decoder, further boosting performance. Extensive experiments on five publicly available datasets covering diverse segmentation tasks, including lesion, polyp, breast cancer, cell, and retinal vessel segmentation, demonstrate that QTSeg outperforms state-of-the-art methods across multiple evaluation metrics while maintaining lower computational costs. Our implementation can be found at: https://github.com/tpnam0901/QTSeg (v1.0.0)
Problem

Research questions and friction points this paper is trying to address.

Enhances medical image segmentation accuracy
Balances local and global information integration
Reduces computational costs in segmentation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mix attention decoder
Multi-level feature distribution
Cross-attention for feature alignment
🔎 Similar Papers
No similar papers found.