🤖 AI Summary
Existing U-Net variants struggle to accurately capture the geometric structure of polyps in low-contrast or complex colonoscopy images, limiting their segmentation performance. To address this, this work proposes a plug-and-play Geometric Prior Module (GPM) that, for the first time, explicitly incorporates geometric priors in the form of depth maps into polyp segmentation. The GPM leverages a Visual Geometry Grounded Transformer (VGGT), fine-tuned on a simulated ColonDepth dataset, to generate endoscopy-specific depth maps, which are then injected into the U-Net encoder features. By integrating spatial and channel attention mechanisms, the module enhances the fusion of local and global contextual information. Extensive experiments demonstrate that GPM significantly outperforms three mainstream baselines across five public datasets, confirming its robustness, generalizability, and compatibility with diverse U-Net architectures.
📝 Abstract
Accurate and robust polyp segmentation is essential for early colorectal cancer detection and for computer-aided diagnosis. While convolutional neural network-, Transformer-, and Mamba-based U-Net variants have achieved strong performance, they still struggle to capture geometric and structural cues, especially in low-contrast or cluttered colonoscopy scenes. To address this challenge, we propose a novel Geometric Prior-guided Module (GPM) that injects explicit geometric priors into U-Net-based architectures for polyp segmentation. Specifically, we fine-tune the Visual Geometry Grounded Transformer (VGGT) on a simulated ColonDepth dataset to estimate depth maps of polyp images tailored to the endoscopic domain. These depth maps are then processed by GPM to encode geometric priors into the encoder's feature maps, where they are further refined using spatial and channel attention mechanisms that emphasize both local spatial and global channel information. GPM is plug-and-play and can be seamlessly integrated into diverse U-Net variants. Extensive experiments on five public polyp segmentation datasets demonstrate consistent gains over three strong baselines. Code and the generated depth maps are available at: https://github.com/fvazqu/GPM-PolypSeg