๐ค AI Summary
Current automatic IVF embryo selection methods suffer from two key limitations: (1) reliance on localized morphological features without holistic quality assessment, or (2) dependence on clinical pregnancy outcomesโhighly confounded by non-embryonic factors. To address this, we introduce the novel task of holistic embryo quality grading directly from full-time-lapse microscopy (TLM) videos. We curate a large-scale, clinically validated dataset comprising over 2,500 real TLM videos with expert annotations. We propose CoSTeM, the first end-to-end video-level framework for holistic grading, which jointly models static morphology and dynamic developmental trajectories. CoSTeM integrates a mixture-of-experts layer with cross-attention fusion, a temporal selection module, and a time-aware Transformer to enable spatiotemporal complementary representation learning. Evaluated on real-world clinical data, CoSTeM significantly outperforms state-of-the-art methods, delivering an interpretable, deployable AI solution for embryo screening. Both code and dataset will be publicly released.
๐ Abstract
Artificial intelligence has recently shown promise in automated embryo selection for In-Vitro Fertilization (IVF). However, current approaches either address partial embryo evaluation lacking holistic quality assessment or target clinical outcomes inevitably confounded by extra-embryonic factors, both limiting clinical utility. To bridge this gap, we propose a new task called Video-Based Embryo Grading - the first paradigm that directly utilizes full-length time-lapse monitoring (TLM) videos to predict embryologists' overall quality assessments. To support this task, we curate a real-world clinical dataset comprising over 2,500 TLM videos, each annotated with a grading label indicating the overall quality of embryos. Grounded in clinical decision-making principles, we propose a Complementary Spatial-Temporal Pattern Mining (CoSTeM) framework that conceptually replicates embryologists' evaluation process. The CoSTeM comprises two branches: (1) a morphological branch using a Mixture of Cross-Attentive Experts layer and a Temporal Selection Block to select discriminative local structural features, and (2) a morphokinetic branch employing a Temporal Transformer to model global developmental trajectories, synergistically integrating static and dynamic determinants for grading embryos. Extensive experimental results demonstrate the superiority of our design. This work provides a valuable methodological framework for AI-assisted embryo selection. The dataset and source code will be publicly available upon acceptance.