VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
To address low-quality knowledge graphs, poor interpretability, and weak generalization in biomedical prediction tasks, this work introduces VitaGraph—a high-quality, multi-source integrated biomedical knowledge graph. We systematically curate the Drug Repurposing Knowledge Graph (DRKG) and integrate interpretable biological features—including molecular fingerprints and Gene Ontology (GO) annotations—to achieve heterogeneous data alignment and redundancy removal. We further propose a link prediction framework that jointly leverages graph neural network pretraining and interpretable feature embedding. Evaluated on drug repositioning, protein–protein interaction prediction, and polypharmacy side effect prediction, our approach achieves state-of-the-art performance across all three tasks, significantly enhancing biological plausibility and generalizability. VitaGraph is the first open-source, fully reproducible, benchmark-ready knowledge graph platform explicitly designed for precision medicine.

Technology Category

Application Category

📝 Abstract
The intrinsic complexity of human biology presents ongoing challenges to scientific understanding. Researchers collaborate across disciplines to expand our knowledge of the biological interactions that define human life. AI methodologies have emerged as powerful tools across scientific domains, particularly in computational biology, where graph data structures effectively model biological entities such as protein-protein interaction (PPI) networks and gene functional networks. Those networks are used as datasets for paramount network medicine tasks, such as gene-disease association prediction, drug repurposing, and polypharmacy side effect studies. Reliable predictions from machine learning models require high-quality foundational data. In this work, we present a comprehensive multi-purpose biological knowledge graph constructed by integrating and refining multiple publicly available datasets. Building upon the Drug Repurposing Knowledge Graph (DRKG), we define a pipeline tasked with a) cleaning inconsistencies and redundancies present in DRKG, b) coalescing information from the main available public data sources, and c) enriching the graph nodes with expressive feature vectors such as molecular fingerprints and gene ontologies. Biologically and chemically relevant features improve the capacity of machine learning models to generate accurate and well-structured embedding spaces. The resulting resource represents a coherent and reliable biological knowledge graph that serves as a state-of-the-art platform to advance research in computational biology and precision medicine. Moreover, it offers the opportunity to benchmark graph-based machine learning and network medicine models on relevant tasks. We demonstrate the effectiveness of the proposed dataset by benchmarking it against the task of drug repurposing, PPI prediction, and side-effect prediction, modeled as link prediction problems.
Problem

Research questions and friction points this paper is trying to address.

Constructing a high-quality biological knowledge graph for computational biology
Integrating and refining diverse public datasets to improve data reliability
Enhancing machine learning models for drug repurposing and disease prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating and refining multiple public biological datasets
Enriching graph nodes with expressive feature vectors
Benchmarking graph-based models on key biomedical tasks
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid