🤖 AI Summary
Echocardiography requires high operator expertise, and substantial inter-individual anatomical variability hinders personalized navigation—existing methods rely on population-averaged models. To address this, we propose a sequence-aware self-supervised pretraining paradigm tailored for novice operators, introducing for the first time joint image-action modeling over scanning trajectories to enable personalized 2D/3D cardiac structure learning under ultrasound guidance. Our method integrates LSTM/Transformer-based temporal modeling with masked image and action prediction, enabling end-to-end estimation of the transducer’s 6-DOF pose. Evaluated on a large-scale dataset of 1.36 million samples, it reduces translational error by 15.90–36.87% and rotational error by 11.13–20.77% over state-of-the-art methods. The core contribution is a dynamic sequence-aware framework that supports personalized anatomical modeling and achieves high-precision transducer navigation.
📝 Abstract
Cardiac ultrasound probe guidance aims to help novices adjust the 6-DOF probe pose to obtain high-quality sectional images. Cardiac ultrasound faces two major challenges: (1) the inherently complex structure of the heart, and (2) significant individual variations. Previous works have only learned the population-averaged 2D and 3D structures of the heart rather than personalized cardiac structural features, leading to a performance bottleneck. Clinically, we observed that sonographers adjust their understanding of a patient's cardiac structure based on prior scanning sequences, thereby modifying their scanning strategies. Inspired by this, we propose a sequence-aware self-supervised pre-training method. Specifically, our approach learns personalized 2D and 3D cardiac structural features by predicting the masked-out images and actions in a scanning sequence. We hypothesize that if the model can predict the missing content it has acquired a good understanding of the personalized cardiac structure. In the downstream probe guidance task, we also introduced a sequence modeling approach that models individual cardiac structural information based on the images and actions from historical scan data, enabling more accurate navigation decisions. Experiments on a large-scale dataset with 1.36 million samples demonstrated that our proposed sequence-aware paradigm can significantly reduce navigation errors, with translation errors decreasing by 15.90% to 36.87% and rotation errors decreasing by 11.13% to 20.77%, compared to state-of-the-art methods.