Investigating Deep Learning Models for Ejection Fraction Estimation from Echocardiography Videos

📅 2025-12-27

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This study addresses the clinical challenges of time-consuming manual left ventricular ejection fraction (LVEF) assessment and substantial inter-observer variability in echocardiography. We propose a video-based deep learning framework for automatic LVEF estimation. Three architectural paradigms—3D Inception, two-stream networks, and CNN-RNN hybrids—are systematically compared. Furthermore, we conduct an in-depth analysis of how model capacity and critical hyperparameters—including kernel size and normalization strategy—affect generalization performance. Evaluated on the EchoNet-Dynamic dataset (10,030 echocardiographic videos), our lightweight, optimized 3D Inception variant achieves a state-of-the-art root mean square error (RMSE) of 6.79% for video-level LVEF regression. Results demonstrate that structural simplification combined with meticulous hyperparameter tuning effectively mitigates overfitting and enhances clinical deployability.

Technology Category

Application Category

📝 Abstract

Left ventricular ejection fraction (LVEF) is a key indicator of cardiac function and plays a central role in the diagnosis and management of cardiovascular disease. Echocardiography, as a readily accessible and non-invasive imaging modality, is widely used in clinical practice to estimate LVEF. However, manual assessment of cardiac function from echocardiograms is time-consuming and subject to considerable inter-observer variability. Deep learning approaches offer a promising alternative, with the potential to achieve performance comparable to that of experienced human experts. In this study, we investigate the effectiveness of several deep learning architectures for LVEF estimation from echocardiography videos, including 3D Inception, two-stream, and CNN-RNN models. We systematically evaluate architectural modifications and fusion strategies to identify configurations that maximize prediction accuracy. Models were trained and evaluated on the EchoNet-Dynamic dataset, comprising 10,030 echocardiogram videos. Our results demonstrate that modified 3D Inception architectures achieve the best overall performance, with a root mean squared error (RMSE) of 6.79%. Across architectures, we observe a tendency toward overfitting, with smaller and simpler models generally exhibiting improved generalization. Model performance was also found to be highly sensitive to hyperparameter choices, particularly convolutional kernel sizes and normalization strategies. While this study focuses on echocardiography-based LVEF estimation, the insights gained regarding architectural design and training strategies may be applicable to a broader range of medical and non-medical video analysis tasks.

Problem

Research questions and friction points this paper is trying to address.

Automating LVEF estimation from echocardiography videos using deep learning.

Reducing manual assessment time and inter-observer variability in cardiac function analysis.

Evaluating and optimizing deep learning architectures for accurate video-based LVEF prediction.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified 3D Inception architecture for video analysis

Systematic evaluation of architectural modifications and fusion strategies

Training on large echocardiography video dataset EchoNet-Dynamic

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

Video Machine Learning Engineer, Audio & Media Technologies

Apple

San Diego, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence