🤖 AI Summary
In infertility treatment, conventional embryo grading suffers from high subjectivity, while inefficient integration of multimodal data—static images, time-lapse videos, and clinical tabular records—hinders improvements in pregnancy success rates. This paper presents a systematic review of multimodal artificial intelligence for embryo grading and pregnancy outcome prediction. It is the first to delineate technical pathways for cross-modal fusion and identify key clinical adoption bottlenecks, proposing a progressive fusion paradigm tailored to small-sample and heterogeneous data. Methodologically, it unifies CNNs/Transformers, LSTMs/3D-CNNs, cross-modal attention mechanisms, and interpretability techniques, synthesizing 12 representative architectures and 27 clinically validated metrics. Empirical results demonstrate that multimodal models improve pregnancy prediction AUC by 0.08–0.15 over unimodal baselines, establishing a standardized methodological framework for AI-enhanced assisted reproductive technology.
📝 Abstract
As a global disease, infertility has always affected human beings. The development of assisted reproductive technology can effectively solve this disease. However, the traditional in vitro fertilization-embryo transfer technology still faces many challenges in improving the success rate of pregnancy, such as the subjectivity of embryo grading and the inefficiency of integrating multi-modal data. Therefore, the introduction of artificial intelligence-based technologies is particularly crucial. This article reviews the application progress of multi-modal artificial intelligence in embryo grading and pregnancy prediction based on different data modalities (including static images, time-lapse videos and structured table data) from a new perspective, and discusses the main challenges in current research, such as the complexity of multi-modal information fusion and data scarcity.