GeMM-GAN: A Multimodal Generative Model Conditioned on Histopathology Images and Clinical Descriptions for Gene Expression Profile Generation

📅 2026-01-21

🏛️ International Conference on Image Analysis and Processing

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study addresses the challenge of limited access to gene expression data due to privacy constraints and high acquisition costs, which hinders multimodal biomedical research. To overcome this, the authors propose a novel multimodal conditional generative adversarial network that, for the first time, integrates a Transformer encoder with cross-attention mechanisms to jointly model histopathology image patches and clinical text for generating biologically plausible gene expression profiles. Evaluated on The Cancer Genome Atlas (TCGA) dataset, the method significantly outperforms existing approaches, with the generated expression profiles improving downstream disease-type classification accuracy by over 11%. These results demonstrate the method’s effectiveness and innovation in synthesizing high-fidelity, multimodal biomedical data.

Technology Category

Application Category

📝 Abstract

Biomedical research increasingly relies on integrating diverse data modalities, including gene expression profiles, medical images, and clinical metadata. While medical images and clinical metadata are routinely collected in clinical practice, gene expression data presents unique challenges for widespread research use, mainly due to stringent privacy regulations and costly laboratory experiments. To address these limitations, we present GeMM-GAN, a novel Generative Adversarial Network conditioned on histopathology tissue slides and clinical metadata, designed to synthesize realistic gene expression profiles. GeMM-GAN combines a Transformer Encoder for image patches with a final Cross Attention mechanism between patches and text tokens, producing a conditioning vector to guide a generative model in generating biologically coherent gene expression profiles. We evaluate our approach on the TCGA dataset and demonstrate that our framework outperforms standard generative models and generates more realistic and functionally meaningful gene expression profiles, improving by more than 11\% the accuracy on downstream disease type prediction compared to current state-of-the-art generative models. Code will be available at: https://github.com/francescapia/GeMM-GAN

Problem

Research questions and friction points this paper is trying to address.

gene expression profiles

histopathology images

clinical metadata

data privacy

costly experiments

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal generative model

histopathology images

gene expression generation