GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing gene regulatory network (GRN) inference methods overlook functional heterogeneity among genes—such as coding versus non-coding roles—and rely solely on coarse-grained metrics, limiting their ability to capture biologically realistic regulatory mechanisms. To address this, we propose BioHGNN, a biotype-aware heterogeneous graph neural network that explicitly incorporates gene biotype information into genetic perturbation effect prediction for the first time. BioHGNN initializes node representations using dual-modality features: DNA sequence embeddings from the Nucleotide Transformer and semantic embeddings of gene descriptions derived from a large language model. It further employs dynamic graph structure learning to refine the heterogeneous GRN topology. This design enables functionally specific modeling and joint learning of implicit regulatory interactions. Evaluated on multiple public perturbation datasets, BioHGNN achieves state-of-the-art performance, significantly improving both perturbation effect prediction accuracy and GRN reconstruction fidelity.

Technology Category

Application Category

📝 Abstract

Predicting genetic perturbations enables the identification of potentially crucial genes prior to wet-lab experiments, significantly improving overall experimental efficiency. Since genes are the foundation of cellular life, building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations. However, current methods fail to fully leverage gene-related information, and solely rely on simple evaluation metrics to construct coarse-grained GRN. More importantly, they ignore functional differences between biotypes, limiting the ability to capture potential gene interactions. In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data, respectively, which serve as the initialization for gene representations. Additionally, we introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes, while capturing implicit gene relationships through graph structure learning (GSL). We propose GRAPE, a heterogeneous graph neural network (HGNN) that leverages gene representations initialized with features from descriptions and sequences, models the distinct roles of genes with different biotypes, and dynamically refines the GRN through GSL. The results on publicly available datasets show that our method achieves state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Predict genetic perturbations to identify crucial genes efficiently

Improve gene regulatory networks by leveraging gene-related information

Model distinct roles of genes with different biotypes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained models for gene feature extraction

Incorporates gene biotype information for regulation

Employs heterogeneous graph neural network (HGNN)

🔎 Similar Papers

No similar papers found.

Authors to Follow