FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention

๐Ÿ“… 2025-04-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the poor robustness of colonoscopic polyp segmentation in multi-center, multi-modal clinical settings, this paper proposes a novel Transformer-based segmentation framework. Methodologically, it introduces three key innovations: (1) a Focused Attention Module (FAM) that uniquely integrates local attention with pooling-based attention; (2) a synergistic architecture comprising a Cross-Semantic Interaction Decoder (CIDM) and a Detail Enhancement Module (DEM), enabling unified global context modeling while preserving fine-grained local texture; and (3) native support for joint training and inference across five endoscopic modalitiesโ€”BLI, FICE, LCI, NBI, and WLI. Evaluated on the multi-center, multi-modal PolypDB benchmark, the method achieves state-of-the-art Dice scores of 93.42% (WLI) and 92.04% (LCI), outperforming all prior approaches. The source code is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Colonoscopy is vital in the early diagnosis of colorectal polyps. Regular screenings can effectively prevent benign polyps from progressing to CRC. While deep learning has made impressive strides in polyp segmentation, most existing models are trained on single-modality and single-center data, making them less effective in real-world clinical environments. To overcome these limitations, we propose FocusNet, a Transformer-enhanced focus attention network designed to improve polyp segmentation. FocusNet incorporates three essential modules: the Cross-semantic Interaction Decoder Module (CIDM) for generating coarse segmentation maps, the Detail Enhancement Module (DEM) for refining shallow features, and the Focus Attention Module (FAM), to balance local detail and global context through local and pooling attention mechanisms. We evaluate our model on PolypDB, a newly introduced dataset with multi-modality and multi-center data for building more reliable segmentation methods. Extensive experiments showed that FocusNet consistently outperforms existing state-of-the-art approaches with a high dice coefficients of 82.47% on the BLI modality, 88.46% on FICE, 92.04% on LCI, 82.09% on the NBI and 93.42% on WLI modality, demonstrating its accuracy and robustness across five different modalities. The source code for FocusNet is available at https://github.com/JunZengz/FocusNet.
Problem

Research questions and friction points this paper is trying to address.

Improves polyp segmentation in colonoscopy using Transformer-enhanced FocusNet
Addresses limitations of single-modality, single-center data in existing models
Balances local detail and global context for robust segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-enhanced focus attention network
Cross-semantic Interaction Decoder Module
Local and pooling attention mechanisms
๐Ÿ”Ž Similar Papers
No similar papers found.
Jun Zeng
Jun Zeng
University of California, Berkeley
Robotics
K
KC Santosh
University of South Dakota, Vermillion, SD, USA
D
Deepak Rajan Nayak
Malaviya National Institute of Technology Jaipur, Rajasthan, India
Thomas de Lange
Thomas de Lange
Dept of Med. Sahlgrenska Univ. Hosp.,Sahlgrenska Academy GU, Augere Medical
endoscopy educationendoscopyartificial inteligencecolorectal cancer screening
J
J. Varkey
Harvard Medical School, Boston, MA, USA
T
Tyler M. Berzin
Harvard Medical School, Boston, MA, USA
Debesh Jha
Debesh Jha
University of South Dakota
Deep LearningBiomedical InformaticsMedical Image computingComputer visionAI for Medicine