🤖 AI Summary
Pelvic fracture fragment segmentation in CT/X-ray images remains challenging due to anatomical complexity and imaging artifacts. Method: We constructed a multi-center CT dataset augmented with synthetic X-ray images generated via DeepDRR, and organized a benchmark challenge involving 16 international teams. Contribution/Results: This study is the first to systematically characterize semantic ambiguity in fracture fragment definitions across modalities. We propose a comprehensive evaluation framework centered on fragment-level IoU, complemented by Dice score and clinically interpretable metrics; we further demonstrate that instance representation strategies—e.g., boundary-core separation—significantly impact performance. State-of-the-art CT segmentation achieves an IoU of 0.930, while X-ray segmentation reaches 0.774. Our findings underscore the necessity of interactive human-in-the-loop segmentation, providing both methodological foundations and practical pathways for intelligent orthopedic image diagnosis and surgical planning.
📝 Abstract
The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm attained an IoU of 0.774, highlighting the greater challenges posed by overlapping anatomical structures. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.