EnTao-GPM: DNA Foundation Model for Predicting the Germline Pathogenic Mutations

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
In precision medicine, accurately distinguishing benign polymorphisms from pathogenic germline variants remains a critical challenge. To address this, we propose a novel framework integrating evolutionary information with interpretable AI: first, cross-species targeted pretraining on multi-organism genomic data leverages evolutionary conservation to improve pathogenicity modeling—especially in noncoding regions; second, task-specific fine-tuning on ClinVar and HGMD couples a DNA foundation model with a large language model (LLM) to jointly perform variant classification and generate statistically grounded, clinically interpretable explanations. Our method achieves significant performance gains over state-of-the-art tools on ClinVar, notably improving accuracy for both SNVs and non-SNV variants—including indels and splice-site alterations. The framework delivers efficient, reliable computational support for automated genetic testing, clinical variant interpretation, and personalized therapeutic intervention.

Technology Category

Application Category

📝 Abstract
Distinguishing pathogenic mutations from benign polymorphisms remains a critical challenge in precision medicine. EnTao-GPM, developed by Fudan University and BioMap, addresses this through three innovations: (1) Cross-species targeted pre-training on disease-relevant mammalian genomes (human, pig, mouse), leveraging evolutionary conservation to enhance interpretation of pathogenic motifs, particularly in non-coding regions; (2) Germline mutation specialization via fine-tuning on ClinVar and HGMD, improving accuracy for both SNVs and non-SNVs; (3) Interpretable clinical framework integrating DNA sequence embeddings with LLM-based statistical explanations to provide actionable insights. Validated against ClinVar, EnTao-GPM demonstrates superior accuracy in mutation classification. It revolutionizes genetic testing by enabling faster, more accurate, and accessible interpretation for clinical diagnostics (e.g., variant assessment, risk identification, personalized treatment) and research, advancing personalized medicine.
Problem

Research questions and friction points this paper is trying to address.

Distinguish pathogenic from benign DNA mutations
Improve accuracy in germline mutation classification
Enable interpretable clinical diagnostics for personalized medicine
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-species pre-training on disease-relevant genomes
Germline mutation specialization via ClinVar fine-tuning
Interpretable clinical framework with DNA sequence embeddings
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Z
Zekai Lin
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
H
Haoran Sun
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
Yucheng Guo
Yucheng Guo
Princeton University
Stochastic AnalysisPartial Differential EquationsMathematical Finance
Y
Yujie Yang
BioMap Research, Beijing 100086, China
Y
Yanwen Wang
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
Bozhen Hu
Bozhen Hu
PhD, Zhejiang University & Westlake University
Graph Neural NetworkProtein Representation
C
Chonghang Ye
BioMap Research, Beijing 100086, China
Q
Qirong Yang
BioMap Research, Beijing 100086, China
F
Fan Zhong
Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
X
Xiaoming Zhang
BioMap Research, Beijing 100086, China
L
Lei Liu
Intelligent Medicine Institute, Fudan University, Shanghai 200032, China; Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai 200032, China; Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai 200120, China