🤖 AI Summary
Identifying causal genes solely from patient phenotypes remains a critical challenge in precision medicine. Method: We propose the first end-to-end, phenotype-only gene discovery framework. It constructs a rare disease knowledge graph and innovatively integrates graph neural networks (GNNs) with Transformers to jointly model structural relations and capture deep semantic phenotypic representations. To overcome the limitation of absent genomic data, we introduce multi-hop relational reasoning and phenotype embedding alignment. Results: On the real-world MyGene2 dataset, our method achieves an MRR of 24.64% and nDCG@100 of 33.64%, significantly outperforming state-of-the-art approaches such as SHEPHERD. Notably, it is the first method enabling generalized gene prediction without requiring a predefined candidate gene list—establishing a scalable, data-efficient paradigm for phenotype-driven rare disease diagnosis.
📝 Abstract
Identifying causative genes from patient phenotypes remains a significant challenge in precision medicine, with important implications for the diagnosis and treatment of genetic disorders. We propose a novel graph-based approach for predicting causative genes from patient phenotypes, with or without an available list of candidate genes, by integrating a rare disease knowledge graph (KG). Our model, combining graph neural networks and transformers, achieves substantial improvements over the current state-of-the-art. On the real-world MyGene2 dataset, it attains a mean reciprocal rank (MRR) of 24.64% and nDCG@100 of 33.64%, surpassing the best baseline (SHEPHERD) at 19.02% MRR and 30.54% nDCG@100. We perform extensive ablation studies to validate the contribution of each model component. Notably, the approach generalizes to cases where only phenotypic data are available, addressing key challenges in clinical decision support when genomic information is incomplete.