🤖 AI Summary
This work addresses the high computational cost and slow inference speed of existing 3D voxel-based models in whole-body skeletal CT segmentation, which hinder their deployment in time-sensitive clinical scenarios such as preoperative planning. To overcome this limitation, we propose Bonnet—a novel ultra-fast sparse volumetric segmentation framework that integrates sparse convolutions with a multi-window fusion strategy. By leveraging Hounsfield Unit (HU) threshold-based pre-screening, spconv-based U-Net patch-wise inference, and multi-window fusion, Bonnet achieves efficient whole-body bone segmentation without requiring fine-tuning. On an RTX A6000 GPU, it attains a single-case inference time of only 2.69 seconds—approximately 25× faster than nnU-Net—while maintaining high Dice accuracy in critical anatomical regions including the ribs, pelvis, and spine, thereby substantially advancing the feasibility of real-time clinical applications.
📝 Abstract
This work proposes Bonnet, an ultra-fast sparse-volume pipeline for whole-body bone segmentation from CT scans. Accurate bone segmentation is important for surgical planning and anatomical analysis, but existing 3D voxel-based models such as nnU-Net and STU-Net require heavy computation and often take several minutes per scan, which limits time-critical use. The proposed Bonnet addresses this by integrating a series of novel framework components including HU-based bone thresholding, patch-wise inference with a sparse spconv-based U-Net, and multi-window fusion into a full-volume prediction. Trained on TotalSegmentator and evaluated without additional tuning on RibSeg, CT-Pelvic1K, and CT-Spine1K, Bonnet achieves high Dice across ribs, pelvis, and spine while running in only 2.69 seconds per scan on an RTX A6000. Compared to strong voxel baselines, Bonnet attains a similar accuracy but reduces inference time by roughly 25x on the same hardware and tiling setup. The toolkit and pre-trained models will be released at https://github.com/HINTLab/Bonnet.