EFGPP: Exploratory framework for genotype-phenotype prediction

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This study addresses the challenge of improving prediction accuracy for complex phenotypes such as migraine by systematically integrating multi-source heterogeneous data, including genotypes, polygenic risk scores (PRS), clinical covariates, and metabolomic features. The authors developed a reproducible analytical framework that leverages tools such as PLINK, PRSice-2, AnnoPred, and LDAK-GWAS to generate and harmonize diverse data modalities, and further evaluated the predictive utility of cross-trait–derived features (e.g., from depression) for the target phenotype. Applied to 733 individuals from the UK Biobank, the integrated model significantly enhanced migraine prediction performance, increasing the area under the receiver operating characteristic curve (AUC) from 0.644—achieved by the best single data type—to 0.688, thereby underscoring the critical value of multimodal data integration in precision phenotypic prediction.
📝 Abstract
Predicting complex human traits from genetic data is challenging because different genetic, clinical, and molecular data sources often contain different parts of the signal. Here, we present EFGPP, a reproducible framework for generating, ranking, and combining multiple types of data for genotype-to-phenotype prediction. We applied EFGPP to migraine prediction using UK Biobank data from 733 individuals. The framework combined genotype-derived features, principal components, clinical and metabolomic covariates, and polygenic risk scores generated from migraine and depression GWAS using PLINK, PRSice-2, AnnoPred, and LDAK-GWAS. The best single data type achieved a test AUC of 0.644, while combining multiple data types improved performance to 0.688 using migraine-focused inputs and 0.663 using cross-trait depression-derived inputs. Genetic features alone did not outperform the covariates-only baseline, but genotype-derived features performed better than PRS alone, and depression-derived PRS showed useful predictive signal. Overall, EFGPP provides a practical proof-of-concept framework for prioritising and integrating heterogeneous genetic data sources for complex phenotype prediction.
Problem

Research questions and friction points this paper is trying to address.

genotype-phenotype prediction
complex traits
heterogeneous data integration
polygenic risk scores
migraine prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

genotype-phenotype prediction
data integration
polygenic risk score
cross-trait prediction
EFGPP framework
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Muhammad Muneeb
Muhammad Muneeb
Unknown affiliation
D
David B. Ascher
School of Chemistry and Molecular Biology, The University of Queensland, Brisbane, 4067, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, Australia