Unified Cross-Modal Image Synthesis with Hierarchical Mixture of Product-of-Experts

๐Ÿ“… 2024-10-25
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the cross-modal high-resolution synthesis of preoperative MRI and intraoperative ultrasound (US) images, this paper proposes a Hierarchical Mixture-of-Experts Variational Autoencoder (MMHVAE). The method tackles key challenges including multimodal latent distribution modeling, missing modality estimation, training with incomplete data, and effective information fusion. Technically, it introducesโ€” for the first timeโ€”a hierarchical mixture-of-experts architecture to jointly model multimodal latent spaces; incorporates explicit variational inference to impute missing modalities and integrates dataset-level priors to enhance robustness; and synergistically combines Product-of-Experts (PoE) fusion with cross-modal latent alignment to achieve unified representation of multiparametric MRI and US. Evaluated on brain imaging data, MMHVAE achieves significant improvements in PSNR and SSIM. Synthesized images exhibit accurate anatomical structures and sharp textural details, enabling real-time intraoperative navigation.

Technology Category

Application Category

๐Ÿ“ Abstract
We propose a deep mixture of multimodal hierarchical variational auto-encoders called MMHVAE that synthesizes missing images from observed images in different modalities. MMHVAE's design focuses on tackling four challenges: (i) creating a complex latent representation of multimodal data to generate high-resolution images; (ii) encouraging the variational distributions to estimate the missing information needed for cross-modal image synthesis; (iii) learning to fuse multimodal information in the context of missing data; (iv) leveraging dataset-level information to handle incomplete data sets at training time. Extensive experiments are performed on the challenging problem of pre-operative brain multi-parametric magnetic resonance and intra-operative ultrasound imaging.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing missing images across different modalities
Creating complex latent representations for high-resolution generation
Learning multimodal fusion with incomplete training datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical mixture of multimodal variational autoencoders
Generates missing images from observed multimodal data
Learns latent representations for cross-modal image synthesis
๐Ÿ”Ž Similar Papers
No similar papers found.