GeMM-GAN: A Multimodal Generative Model Conditioned on Histopathology Images and Clinical Descriptions for Gene Expression Profile Generation

πŸ“… 2026-01-21
πŸ›οΈ International Conference on Image Analysis and Processing
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of limited access to gene expression data due to privacy constraints and high acquisition costs, which hinders multimodal biomedical research. To overcome this, the authors propose a novel multimodal conditional generative adversarial network that, for the first time, integrates a Transformer encoder with cross-attention mechanisms to jointly model histopathology image patches and clinical text for generating biologically plausible gene expression profiles. Evaluated on The Cancer Genome Atlas (TCGA) dataset, the method significantly outperforms existing approaches, with the generated expression profiles improving downstream disease-type classification accuracy by over 11%. These results demonstrate the method’s effectiveness and innovation in synthesizing high-fidelity, multimodal biomedical data.

Technology Category

Application Category

πŸ“ Abstract
Biomedical research increasingly relies on integrating diverse data modalities, including gene expression profiles, medical images, and clinical metadata. While medical images and clinical metadata are routinely collected in clinical practice, gene expression data presents unique challenges for widespread research use, mainly due to stringent privacy regulations and costly laboratory experiments. To address these limitations, we present GeMM-GAN, a novel Generative Adversarial Network conditioned on histopathology tissue slides and clinical metadata, designed to synthesize realistic gene expression profiles. GeMM-GAN combines a Transformer Encoder for image patches with a final Cross Attention mechanism between patches and text tokens, producing a conditioning vector to guide a generative model in generating biologically coherent gene expression profiles. We evaluate our approach on the TCGA dataset and demonstrate that our framework outperforms standard generative models and generates more realistic and functionally meaningful gene expression profiles, improving by more than 11\% the accuracy on downstream disease type prediction compared to current state-of-the-art generative models. Code will be available at: https://github.com/francescapia/GeMM-GAN
Problem

Research questions and friction points this paper is trying to address.

gene expression profiles
histopathology images
clinical metadata
data privacy
costly experiments
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal generative model
histopathology images
gene expression generation
cross-attention mechanism
GAN
πŸ”Ž Similar Papers
No similar papers found.
F
Francesca Pia Panaccione
DEIB - Dipartimento Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
C
Carlo Sgaravatti
DEIB - Dipartimento Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Pietro Pinoli
Pietro Pinoli
Research Fellow, Politecnico di Milano
BioinformaticsMachine LearningBig Data