🤖 AI Summary
Addressing the scarcity of paired histopathological images and gene expression profiles in clinical settings—which hinders the deployment of multimodal AI—this paper introduces PathoGen, the first diffusion-based framework for high-fidelity generation of gene expression profiles from H&E-stained whole-slide images. PathoGen requires no real RNA-seq data, leveraging multimodal representation alignment and a Transformer-based encoder-decoder architecture to enable cross-modal synthesis. It incorporates conformal prediction to guarantee 95% confidence coverage and employs a distributed attention mechanism to produce biologically interpretable saliency maps. Evaluated across multiple cancer cohorts, PathoGen achieves state-of-the-art performance on downstream tasks: tumor grading AUC ≥ 0.92 and survival risk prediction C-index ≥ 0.78—significantly outperforming baseline methods reliant on ground-truth transcriptomic data.
📝 Abstract
Emerging research has highlighted that artificial intelligence based multimodal fusion of digital pathology and transcriptomic features can improve cancer diagnosis (grading/subtyping) and prognosis (survival risk) prediction. However, such direct fusion for joint decision is impractical in real clinical settings, where histopathology is still the gold standard for diagnosis and transcriptomic tests are rarely requested, at least in the public healthcare system. With our novel diffusion based crossmodal generative AI model PathoGen, we show that genomic expressions synthesized from digital histopathology jointly predicts cancer grading and patient survival risk with high accuracy (state-of-the-art performance), certainty (through conformal coverage guarantee) and interpretability (through distributed attention maps). PathoGen code is available for open use by the research community through GitHub at https://github.com/Samiran-Dey/PathoGen.