Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying high-accuracy polyp segmentation models in resource-constrained settings such as primary care or mobile endoscopy, where GPU dependency is prohibitive. To this end, we propose the UltraSeg family of ultra-lightweight architectures, which achieves native real-time inference on CPU—reaching 90 FPS on a single core—with only 0.108–0.13 million parameters (approximately 0.4% of standard U-Net). By jointly optimizing encoder-decoder width, incorporating constrained dilated convolutions to expand the receptive field, and designing a lightweight cross-layer fusion module, UltraSeg attains over 94% of U-Net’s Dice score across seven public datasets. The model balances high accuracy, single-center optimization, and multi-center generalization, offering a plug-and-play clinical solution for low-resource environments.

Technology Category

Application Category

📝 Abstract
Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains>94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.
Problem

Research questions and friction points this paper is trying to address.

real-time polyp segmentation
commodity CPUs
resource-constrained settings
colorectal cancer detection
extreme model compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

ultra-lightweight architecture
real-time polyp segmentation
CPU-native inference
extreme model compression
cross-layer fusion
🔎 Similar Papers
No similar papers found.
Weihao Gao
Weihao Gao
Moonshot AI
Machine LearningDeep LearningInformation Theory
Zhuo Deng
Zhuo Deng
Applied Scientist, Amazon
Computer visionDeep learningMachine LeariningRobotics
Z
Zheng Gong
Shenzhen International Graduate School, Tsinghua University
L
Lan Ma
Shenzhen International Graduate School, Tsinghua University