EmbryoDiff: A Conditional Diffusion Framework with Multi-Focal Feature Fusion for Fine-Grained Embryo Developmental Stage Recognition

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In IVF, fine-grained embryonic developmental stage classification faces two key challenges: (1) existing models neglect the inherent distributional prior of embryonic development, and (2) single-focus imaging leads to feature ambiguity and incomplete representation under cellular occlusion. To address these, we propose a two-stage diffusion-based framework. Stage I employs a frame-level encoder to extract features from multi-focus image stacks, constructing a 3D-aware morphological representation. Stage II introduces a semantic-boundary hybrid conditioning module that embeds developmental priors into the conditional denoising process of the diffusion model, enabling robust sequential classification. To our knowledge, this is the first work to integrate embryonic developmental priors with multi-focus feature learning within a diffusion-based classification paradigm. On two benchmark datasets, our method achieves 82.8% and 81.3% mean accuracy using only a single denoising step—substantially outperforming current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Identification of fine-grained embryo developmental stages during In Vitro Fertilization (IVF) is crucial for assessing embryo viability. Although recent deep learning methods have achieved promising accuracy, existing discriminative models fail to utilize the distributional prior of embryonic development to improve accuracy. Moreover, their reliance on single-focal information leads to incomplete embryonic representations, making them susceptible to feature ambiguity under cell occlusions. To address these limitations, we propose EmbryoDiff, a two-stage diffusion-based framework that formulates the task as a conditional sequence denoising process. Specifically, we first train and freeze a frame-level encoder to extract robust multi-focal features. In the second stage, we introduce a Multi-Focal Feature Fusion Strategy that aggregates information across focal planes to construct a 3D-aware morphological representation, effectively alleviating ambiguities arising from cell occlusions. Building on this fused representation, we derive complementary semantic and boundary cues and design a Hybrid Semantic-Boundary Condition Block to inject them into the diffusion-based denoising process, enabling accurate embryonic stage classification. Extensive experiments on two benchmark datasets show that our method achieves state-of-the-art results. Notably, with only a single denoising step, our model obtains the best average test performance, reaching 82.8% and 81.3% accuracy on the two datasets, respectively.
Problem

Research questions and friction points this paper is trying to address.

Recognizing fine-grained embryo developmental stages during IVF procedures
Addressing feature ambiguity caused by cell occlusions in embryo imaging
Utilizing embryonic development distributional priors to improve classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional diffusion framework for embryo stage recognition
Multi-focal feature fusion for 3D-aware representation
Hybrid semantic-boundary condition block for denoising process
🔎 Similar Papers
No similar papers found.
Y
Yong Sun
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zhengjie Zhang
The Hong Kong University of Science and Technology (Guangzhou)
J
Junyu Shi
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zhiyuan Zhang
The Hong Kong University of Science and Technology (Guangzhou)
L
Lijiang Liu
The Hong Kong University of Science and Technology (Guangzhou)
Qiang Nie
Qiang Nie
Assistant Professor, Hong Kong University of Science and Technology, Guangzhou, China
roboticshuman-robot interactionartificial intelligencecomputer vision