Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation faces two key challenges: insufficient global context modeling and difficulty in representing multi-scale anatomical structures. Conventional CNNs, constrained by local receptive fields, struggle to accurately segment lesions with complex boundaries or highly variable sizes. To address these limitations, we propose a convolutional-Transformer hybrid architecture featuring: (1) focal modulation attention, which enhances long-range dependency modeling and selective focus on discriminative regions; and (2) a bidirectional cross-scale feature fusion module enabling efficient, symmetric information exchange between encoder and decoder pathways. Evaluated on eight benchmark medical image segmentation datasets, our method achieves state-of-the-art performance, consistently outperforming existing approaches in both Jaccard and Dice coefficients. Ablation studies confirm its superior boundary precision, scale robustness, and generalization capability across diverse anatomical domains and imaging modalities.

Technology Category

Application Category

📝 Abstract
Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological and spatial information on anatomical structures that directly influence treatment decisions. Convolutional neural networks significantly impact image segmentation; however, since convolution operations are local, capturing global contextual information and long-range dependencies is still challenging. Their capacity to precisely segment structures with complicated borders and a variety of sizes is impacted by this restriction. Since transformers use self-attention methods to capture global context and long-range dependencies efficiently, integrating transformer-based architecture with CNNs is a feasible approach to overcoming these challenges. To address these challenges, we propose the Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation, referred to as FM-BFF-Net in the remainder of this paper. The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales. Through this design, FM-BFF-Net enhances boundary precision and robustness to variations in lesion size, shape, and contrast. Extensive experiments on eight publicly available datasets, including polyp detection, skin lesion segmentation, and ultrasound imaging, show that FM-BFF-Net consistently surpasses recent state-of-the-art methods in Jaccard index and Dice coefficient, confirming its effectiveness and adaptability for diverse medical imaging scenarios.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations in capturing global context in medical image segmentation
Improving segmentation of structures with complex borders and varying sizes
Enhancing boundary precision and robustness to lesion variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines convolutional and transformer components for segmentation
Uses focal modulation attention to enhance context awareness
Introduces bidirectional feature fusion for cross-scale interactions
🔎 Similar Papers
No similar papers found.