🤖 AI Summary
This work addresses the challenge of deploying high-accuracy polyp segmentation models in resource-constrained settings such as primary care or mobile endoscopy, where GPU dependency is prohibitive. To this end, we propose the UltraSeg family of ultra-lightweight architectures, which achieves native real-time inference on CPU—reaching 90 FPS on a single core—with only 0.108–0.13 million parameters (approximately 0.4% of standard U-Net). By jointly optimizing encoder-decoder width, incorporating constrained dilated convolutions to expand the receptive field, and designing a lightweight cross-layer fusion module, UltraSeg attains over 94% of U-Net’s Dice score across seven public datasets. The model balances high accuracy, single-center optimization, and multi-center generalization, offering a plug-and-play clinical solution for low-resource environments.
📝 Abstract
Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains>94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.