HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Medical image segmentation of minute tumors and micro-organs demands simultaneous modeling of fine-grained local details and global contextual dependencies—yet existing shift-window Vision Transformers (ViTs) suffer from insufficient feature fusion capability. To address this, we propose the Hybrid Bridging Transformer (HBT), whose core innovation is a Multi-Scale Feature Fusion (MFF) decoder: an asymmetric bridge that integrates hierarchical features from a Swin backbone while incorporating a coupled channel-spatial joint attention mechanism. Additionally, depthwise separable and dilated convolutions are integrated to expand receptive fields and improve computational efficiency. Evaluated on multiple public medical segmentation benchmarks, HBT achieves state-of-the-art performance, demonstrating significant improvements in fine-grained boundary accuracy and long-range dependency modeling. Ablation studies confirm the efficacy of each design component, and cross-dataset validation underscores its robust generalizability.

Technology Category

Application Category

📝 Abstract

Medical image segmentation is a cornerstone of modern clinical diagnostics. While Vision Transformers that leverage shifted window-based self-attention have established new benchmarks in this field, they are often hampered by a critical limitation: their localized attention mechanism struggles to effectively fuse local details with global context. This deficiency is particularly detrimental to challenging tasks such as the segmentation of microtumors and miniature organs, where both fine-grained boundary definition and broad contextual understanding are paramount. To address this gap, we propose HBFormer, a novel Hybrid-Bridge Transformer architecture. The'Hybrid'design of HBFormer synergizes a classic U-shaped encoder-decoder framework with a powerful Swin Transformer backbone for robust hierarchical feature extraction. The core innovation lies in its'Bridge'mechanism, a sophisticated nexus for multi-scale feature integration. This bridge is architecturally embodied by our novel Multi-Scale Feature Fusion (MFF) decoder. Departing from conventional symmetric designs, the MFF decoder is engineered to fuse multi-scale features from the encoder with global contextual information. It achieves this through a synergistic combination of channel and spatial attention modules, which are constructed from a series of dilated and depth-wise convolutions. These components work in concert to create a powerful feature bridge that explicitly captures long-range dependencies and refines object boundaries with exceptional precision. Comprehensive experiments on challenging medical image segmentation datasets, including multi-organ, liver tumor, and bladder tumor benchmarks, demonstrate that HBFormer achieves state-of-the-art results, showcasing its outstanding capabilities in microtumor and miniature organ segmentation. Code and models are available at: https://github.com/lzeeorno/HBFormer.

Problem

Research questions and friction points this paper is trying to address.

Segments microtumors and miniature organs in medical images

Fuses local details with global context for segmentation

Improves boundary definition in challenging medical image tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid U-shaped encoder-decoder with Swin Transformer backbone

Multi-scale feature fusion bridge using channel and spatial attention

Dilated and depth-wise convolutions for long-range dependencies

🔎 Similar Papers

SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation