HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation of minute tumors and micro-organs demands simultaneous modeling of fine-grained local details and global contextual dependencies—yet existing shift-window Vision Transformers (ViTs) suffer from insufficient feature fusion capability. To address this, we propose the Hybrid Bridging Transformer (HBT), whose core innovation is a Multi-Scale Feature Fusion (MFF) decoder: an asymmetric bridge that integrates hierarchical features from a Swin backbone while incorporating a coupled channel-spatial joint attention mechanism. Additionally, depthwise separable and dilated convolutions are integrated to expand receptive fields and improve computational efficiency. Evaluated on multiple public medical segmentation benchmarks, HBT achieves state-of-the-art performance, demonstrating significant improvements in fine-grained boundary accuracy and long-range dependency modeling. Ablation studies confirm the efficacy of each design component, and cross-dataset validation underscores its robust generalizability.

Technology Category

Application Category

📝 Abstract
Medical image segmentation is a cornerstone of modern clinical diagnostics. While Vision Transformers that leverage shifted window-based self-attention have established new benchmarks in this field, they are often hampered by a critical limitation: their localized attention mechanism struggles to effectively fuse local details with global context. This deficiency is particularly detrimental to challenging tasks such as the segmentation of microtumors and miniature organs, where both fine-grained boundary definition and broad contextual understanding are paramount. To address this gap, we propose HBFormer, a novel Hybrid-Bridge Transformer architecture. The'Hybrid'design of HBFormer synergizes a classic U-shaped encoder-decoder framework with a powerful Swin Transformer backbone for robust hierarchical feature extraction. The core innovation lies in its'Bridge'mechanism, a sophisticated nexus for multi-scale feature integration. This bridge is architecturally embodied by our novel Multi-Scale Feature Fusion (MFF) decoder. Departing from conventional symmetric designs, the MFF decoder is engineered to fuse multi-scale features from the encoder with global contextual information. It achieves this through a synergistic combination of channel and spatial attention modules, which are constructed from a series of dilated and depth-wise convolutions. These components work in concert to create a powerful feature bridge that explicitly captures long-range dependencies and refines object boundaries with exceptional precision. Comprehensive experiments on challenging medical image segmentation datasets, including multi-organ, liver tumor, and bladder tumor benchmarks, demonstrate that HBFormer achieves state-of-the-art results, showcasing its outstanding capabilities in microtumor and miniature organ segmentation. Code and models are available at: https://github.com/lzeeorno/HBFormer.
Problem

Research questions and friction points this paper is trying to address.

Segments microtumors and miniature organs in medical images
Fuses local details with global context for segmentation
Improves boundary definition in challenging medical image tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid U-shaped encoder-decoder with Swin Transformer backbone
Multi-scale feature fusion bridge using channel and spatial attention
Dilated and depth-wise convolutions for long-range dependencies
🔎 Similar Papers
No similar papers found.
F
Fuchen Zheng
University of Macau
X
Xinyi Chen
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Weixuan Li
Weixuan Li
University of Florida
Mechanical engineeringMaterial scienceMolecular dynamicsMultiscale simulation
Q
Quanjun Li
Guangdong University of Technology
J
Junhua Zhou
Guangdong University of Technology
Xiaojiao Guo
Xiaojiao Guo
Unknown affiliation
Xuhang Chen
Xuhang Chen
Huizhou University
computational imaginglow-level visioncomputational photography
Chi-Man Pun
Chi-Man Pun
Professor of Computer and Information Science, University of Macau
Image ProcessingPattern RecognitionMultimedia and AI SecurityMedical Image Analysis
S
Shoujun Zhou
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences