🤖 AI Summary
In IVF, fine-grained embryonic developmental stage classification faces two key challenges: (1) existing models neglect the inherent distributional prior of embryonic development, and (2) single-focus imaging leads to feature ambiguity and incomplete representation under cellular occlusion. To address these, we propose a two-stage diffusion-based framework. Stage I employs a frame-level encoder to extract features from multi-focus image stacks, constructing a 3D-aware morphological representation. Stage II introduces a semantic-boundary hybrid conditioning module that embeds developmental priors into the conditional denoising process of the diffusion model, enabling robust sequential classification. To our knowledge, this is the first work to integrate embryonic developmental priors with multi-focus feature learning within a diffusion-based classification paradigm. On two benchmark datasets, our method achieves 82.8% and 81.3% mean accuracy using only a single denoising step—substantially outperforming current state-of-the-art approaches.
📝 Abstract
Identification of fine-grained embryo developmental stages during In Vitro Fertilization (IVF) is crucial for assessing embryo viability. Although recent deep learning methods have achieved promising accuracy, existing discriminative models fail to utilize the distributional prior of embryonic development to improve accuracy. Moreover, their reliance on single-focal information leads to incomplete embryonic representations, making them susceptible to feature ambiguity under cell occlusions. To address these limitations, we propose EmbryoDiff, a two-stage diffusion-based framework that formulates the task as a conditional sequence denoising process. Specifically, we first train and freeze a frame-level encoder to extract robust multi-focal features. In the second stage, we introduce a Multi-Focal Feature Fusion Strategy that aggregates information across focal planes to construct a 3D-aware morphological representation, effectively alleviating ambiguities arising from cell occlusions. Building on this fused representation, we derive complementary semantic and boundary cues and design a Hybrid Semantic-Boundary Condition Block to inject them into the diffusion-based denoising process, enabling accurate embryonic stage classification. Extensive experiments on two benchmark datasets show that our method achieves state-of-the-art results. Notably, with only a single denoising step, our model obtains the best average test performance, reaching 82.8% and 81.3% accuracy on the two datasets, respectively.