π€ AI Summary
Existing single-view X-ray analysis methods struggle to establish robust anatomical correspondences across orthogonal projections (e.g., anteroposterior and lateral views), limiting diagnostic accuracy for fractures, muscle injuries, and other musculoskeletal pathologies. To address this, we propose the first self-supervised learning framework for multi-view X-ray matching. Our method leverages unlabeled CT volumes to synthesize large-scale, paired digital reconstructed radiographs (DRRs) under different projection geometries. We introduce a Transformer-based correspondence prediction model trained via contrastive learning and feature alignment to jointly model pixel-level and region-level anatomical associations across views. The learned representations are then transferred to real X-ray images without requiring manual annotations. Evaluated on both synthetic and clinical datasets, our approach significantly improves multi-view fracture classification accuracy (+8.2%) and robustness against domain shift. This work establishes a scalable, interpretable paradigm for multi-view X-rayεε diagnosis.
π Abstract
Accurate interpretation of multi-view radiographs is crucial for diagnosing fractures, muscular injuries, and other anomalies. While significant advances have been made in AI-based analysis of single images, current methods often struggle to establish robust correspondences between different X-ray views, an essential capability for precise clinical evaluations. In this work, we present a novel self-supervised pipeline that eliminates the need for manual annotation by automatically generating a many-to-many correspondence matrix between synthetic X-ray views. This is achieved using digitally reconstructed radiographs (DRR), which are automatically derived from unannotated CT volumes. Our approach incorporates a transformer-based training phase to accurately predict correspondences across two or more X-ray views. Furthermore, we demonstrate that learning correspondences among synthetic X-ray views can be leveraged as a pretraining strategy to enhance automatic multi-view fracture detection on real data. Extensive evaluations on both synthetic and real X-ray datasets show that incorporating correspondences improves performance in multi-view fracture classification.