QTSeg: A Query Token-Based Dual-Mix Attention Framework with Multi-Level Feature Distribution for Medical Image Segmentation

📅 2024-12-23

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Medical image segmentation faces dual challenges: limited long-range modeling capability in CNNs and prohibitively high computational overhead in Transformers, making it difficult for existing methods to balance accuracy and efficiency. To address this, we propose a novel CNN-Transformer hybrid architecture. Our key contributions are: (1) a query token-driven dual hybrid attention decoder that jointly captures fine-grained local spatial details and global semantic dependencies; and (2) a multi-level feature distribution module that adaptively regulates multi-scale feature flow between encoder and decoder. By deeply integrating cross-, spatial-, and channel-wise attention mechanisms, our method achieves state-of-the-art performance across five major medical benchmarks—lesion, polyp, breast cancer, cell, and retinal vessel segmentation—consistently outperforming prior approaches in both segmentation accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Medical image segmentation plays a crucial role in assisting healthcare professionals with accurate diagnoses and enabling automated diagnostic processes. Traditional convolutional neural networks (CNNs) often struggle with capturing long-range dependencies, while transformer-based architectures, despite their effectiveness, come with increased computational complexity. Recent efforts have focused on combining CNNs and transformers to balance performance and efficiency, but existing approaches still face challenges in achieving high segmentation accuracy while maintaining low computational costs. Furthermore, many methods underutilize the CNN encoder's capability to capture local spatial information, concentrating primarily on mitigating long-range dependency issues. To address these limitations, we propose QTSeg, a novel architecture for medical image segmentation that effectively integrates local and global information. QTSeg features a dual-mix attention decoder designed to enhance segmentation performance through: (1) a cross-attention mechanism for improved feature alignment, (2) a spatial attention module to capture long-range dependencies, and (3) a channel attention block to learn inter-channel relationships. Additionally, we introduce a multi-level feature distribution module, which adaptively balances feature propagation between the encoder and decoder, further boosting performance. Extensive experiments on five publicly available datasets covering diverse segmentation tasks, including lesion, polyp, breast cancer, cell, and retinal vessel segmentation, demonstrate that QTSeg outperforms state-of-the-art methods across multiple evaluation metrics while maintaining lower computational costs. Our implementation can be found at: https://github.com/tpnam0901/QTSeg (v1.0.0)

Problem

Research questions and friction points this paper is trying to address.

Enhances medical image segmentation accuracy

Balances local and global information integration

Reduces computational costs in segmentation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mix attention decoder

Multi-level feature distribution

Cross-attention for feature alignment

🔎 Similar Papers

No similar papers found.