PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Identifying causal genes solely from patient phenotypes remains a critical challenge in precision medicine. Method: We propose the first end-to-end, phenotype-only gene discovery framework. It constructs a rare disease knowledge graph and innovatively integrates graph neural networks (GNNs) with Transformers to jointly model structural relations and capture deep semantic phenotypic representations. To overcome the limitation of absent genomic data, we introduce multi-hop relational reasoning and phenotype embedding alignment. Results: On the real-world MyGene2 dataset, our method achieves an MRR of 24.64% and nDCG@100 of 33.64%, significantly outperforming state-of-the-art approaches such as SHEPHERD. Notably, it is the first method enabling generalized gene prediction without requiring a predefined candidate gene list—establishing a scalable, data-efficient paradigm for phenotype-driven rare disease diagnosis.

Technology Category

Application Category

📝 Abstract

Identifying causative genes from patient phenotypes remains a significant challenge in precision medicine, with important implications for the diagnosis and treatment of genetic disorders. We propose a novel graph-based approach for predicting causative genes from patient phenotypes, with or without an available list of candidate genes, by integrating a rare disease knowledge graph (KG). Our model, combining graph neural networks and transformers, achieves substantial improvements over the current state-of-the-art. On the real-world MyGene2 dataset, it attains a mean reciprocal rank (MRR) of 24.64% and nDCG@100 of 33.64%, surpassing the best baseline (SHEPHERD) at 19.02% MRR and 30.54% nDCG@100. We perform extensive ablation studies to validate the contribution of each model component. Notably, the approach generalizes to cases where only phenotypic data are available, addressing key challenges in clinical decision support when genomic information is incomplete.

Problem

Research questions and friction points this paper is trying to address.

Identifying causative genes from patient phenotypes in precision medicine

Predicting causative genes using rare disease knowledge graphs

Addressing clinical decision challenges with incomplete genomic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based approach for gene prediction

Combines graph neural networks and transformers

Integrates rare disease knowledge graph

🔎 Similar Papers

No similar papers found.