InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Supervised gene regulatory network (GRN) inference methods rely on costly, imbalanced, and gene-biased ground-truth (GT) labels that poorly reflect true biological regulation. To address this, we propose the first unsupervised generative GRN inference framework: it incorporates biologically grounded text embeddings (e.g., PubMedBERT) as semantic priors and integrates heterogeneous biological knowledge; jointly models gene expression and regulatory topology via a variational autoencoder coupled with differentiable graph learning; and introduces a biology-driven evaluation system aligned with downstream tasks (e.g., biomarker discovery) to expose implicit biases in supervised approaches. On four benchmark datasets, our method achieves an average 38.5% performance gain over state-of-the-art supervised baselines; incorporating weak label priors further improves performance by 11.1%; and critically mitigates both label bias and class imbalance—key limitations of GT-dependent methods.

Technology Category

Application Category

📝 Abstract

Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT) labels and risk learning gene-specific biases, such as class imbalances of GT interactions, rather than true regulatory mechanisms. To address these issues, we introduce InfoSEM, an unsupervised generative model that leverages textual gene embeddings as informative priors, improving GRN inference without GT labels. InfoSEM can also integrate GT labels as an additional prior when available, avoiding biases and further enhancing performance. Additionally, we propose a biologically motivated benchmarking framework that better reflects real-world applications such as biomarker discovery and reveals learned biases of existing supervised methods. InfoSEM outperforms existing models by 38.5% across four datasets using textual embeddings prior and further boosts performance by 11.1% when integrating labeled data as priors.

Problem

Research questions and friction points this paper is trying to address.

Inferring Gene Regulatory Networks without costly ground truth labels.

Leveraging textual gene embeddings to improve GRN inference accuracy.

Proposing a benchmarking framework to evaluate real-world GRN applications.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised generative model with textual gene embeddings

Integrates ground truth labels to avoid biases

Biologically motivated benchmarking framework for real-world applications

🔎 Similar Papers

A deep graph model for the signed interaction prediction in biological network