Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen

📅 2024-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-cell generative models predominantly rely on continuous expression approximations, neglecting the intrinsic discrete-count nature of RNA-seq data, and lack support for controllable generation across multimodal (e.g., RNA+ATAC) and multi-attribute (e.g., cell type, batch, state) conditions. To address this, we propose CFGen—the first conditional flow-matching generative model specifically designed for discrete single-cell count data. CFGen directly models raw UMI counts via conditional normalizing flows, integrating multimodal alignment embeddings and batch-aware conditional encodings to enable fine-grained, interpretable cross-modal joint generation. Evaluated on multiple real-world datasets, CFGen significantly improves rare cell type augmentation, batch-effect correction, and developmental trajectory simulation. Moreover, it faithfully recapitulates biologically critical features—including gene co-expression patterns and bimodal expression distributions—demonstrating superior fidelity and biological plausibility.

Technology Category

Application Category

📝 Abstract
Generative modeling of single-cell RNA-seq data is crucial for tasks like trajectory inference, batch effect removal, and simulation of realistic cellular data. However, recent deep generative models simulating synthetic single cells from noise operate on pre-processed continuous gene expression approximations, overlooking the discrete nature of single-cell data, which limits their effectiveness and hinders the incorporation of robust noise models. Additionally, aspects like controllable multi-modal and multi-label generation of cellular data remain underexplored. This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data. CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics while tackling relevant generative tasks such as rare cell type augmentation and batch correction. We also introduce a novel framework for compositional data generation using Flow Matching. By showcasing CFGen on a diverse set of biological datasets and settings, we provide evidence of its value to the fields of computational biology and deep generative models.
Problem

Research questions and friction points this paper is trying to address.

Generative modeling of single-cell RNA-seq data for biological tasks.
Overcoming limitations of deep generative models with discrete data.
Enabling controllable multi-modal and multi-label cellular data generation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

CFGen preserves single-cell data discreteness
Generates multi-modal, multi-label cellular data
Uses Flow Matching for compositional data generation
🔎 Similar Papers
No similar papers found.
A
Alessandro Palma
Helmholtz Munich, Technical University of Munich
T
Till Richter
Helmholtz Munich, Technical University of Munich
Hanyi Zhang
Hanyi Zhang
University of Heidelberg
Deep LearningBiomedical Imaging
M
Manuel Lubetzki
Helmholtz Munich, Technical University of Munich
Alexander Tong
Alexander Tong
Aithyra
Flow ModelsDeep LearningOptimal TransportSingle-cellProtein design
Andrea Dittadi
Andrea Dittadi
Helmholtz AI | Technical University of Munich
generative modelsrepresentation learningmachine learningdeep learning
F
Fabian J. Theis
Helmholtz Munich, Technical University of Munich