🤖 AI Summary
Accurate automatic identification of end-diastolic (ED) and end-systolic (ES) phases is critical for cardiac functional assessment, yet existing data-driven approaches rely heavily on large-scale manual annotations (e.g., segmentation masks, volumetric measurements, or phase labels), limiting clinical deployment. This work proposes the first fully self-supervised framework for ED/ES localization in adult and fetal echocardiographic videos—requiring no annotations whatsoever. Our method leverages a spatiotemporal reconstruction model to implicitly learn cardiac motion trajectories. Key innovations include embedding interpretable motion-pattern learning into the self-supervised reconstruction objective and designing a novel unsupervised phase localization algorithm. On EchoNet-Dynamic, our method achieves mean absolute errors of 3 frames (58.3 ms) for ED and 2 frames (38.8 ms) for ES; on fetal data, errors are 1.46 frames (20.7 ms) and 1.74 frames (25.3 ms), respectively—matching state-of-the-art supervised methods. The framework demonstrates strong generalizability and clinical applicability.
📝 Abstract
The identification of cardiac phase is an essential step for analysis and diagnosis of cardiac function. Automatic methods, especially data-driven methods for cardiac phase detection, typically require extensive annotations, which is time-consuming and labor-intensive. In this paper, we present an unsupervised framework for end-diastole (ED) and end-systole (ES) detection through self-supervised learning of latent cardiac motion trajectories from 4-chamber-view echocardiography videos. Our method eliminates the need for manual annotations, including ED and ES indices, segmentation, or volumetric measurements, by training a reconstruction model to encode interpretable spatiotemporal motion patterns. Evaluated on the EchoNet-Dynamic benchmark, the approach achieves mean absolute error (MAE) of 3 frames (58.3 ms) for ED and 2 frames (38.8 ms) for ES detection, matching state-of-the-art supervised methods. Extended to fetal echocardiography, the model demonstrates robust performance with MAE 1.46 frames (20.7 ms) for ED and 1.74 frames (25.3 ms) for ES, despite the fact that the fetal heart model is built using non-standardized heart views due to fetal heart positioning variability. Our results demonstrate the potential of the proposed latent motion trajectory strategy for cardiac phase detection in adult and fetal echocardiography. This work advances unsupervised cardiac motion analysis, offering a scalable solution for clinical populations lacking annotated data. Code will be released at https://github.com/YingyuYyy/CardiacPhase.