EnTao-GPM: DNA Foundation Model for Predicting the Germline Pathogenic Mutations

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In precision medicine, accurately distinguishing benign polymorphisms from pathogenic germline variants remains a critical challenge. To address this, we propose a novel framework integrating evolutionary information with interpretable AI: first, cross-species targeted pretraining on multi-organism genomic data leverages evolutionary conservation to improve pathogenicity modeling—especially in noncoding regions; second, task-specific fine-tuning on ClinVar and HGMD couples a DNA foundation model with a large language model (LLM) to jointly perform variant classification and generate statistically grounded, clinically interpretable explanations. Our method achieves significant performance gains over state-of-the-art tools on ClinVar, notably improving accuracy for both SNVs and non-SNV variants—including indels and splice-site alterations. The framework delivers efficient, reliable computational support for automated genetic testing, clinical variant interpretation, and personalized therapeutic intervention.

Technology Category

Application Category

📝 Abstract
Distinguishing pathogenic mutations from benign polymorphisms remains a critical challenge in precision medicine. EnTao-GPM, developed by Fudan University and BioMap, addresses this through three innovations: (1) Cross-species targeted pre-training on disease-relevant mammalian genomes (human, pig, mouse), leveraging evolutionary conservation to enhance interpretation of pathogenic motifs, particularly in non-coding regions; (2) Germline mutation specialization via fine-tuning on ClinVar and HGMD, improving accuracy for both SNVs and non-SNVs; (3) Interpretable clinical framework integrating DNA sequence embeddings with LLM-based statistical explanations to provide actionable insights. Validated against ClinVar, EnTao-GPM demonstrates superior accuracy in mutation classification. It revolutionizes genetic testing by enabling faster, more accurate, and accessible interpretation for clinical diagnostics (e.g., variant assessment, risk identification, personalized treatment) and research, advancing personalized medicine.
Problem

Research questions and friction points this paper is trying to address.

Distinguish pathogenic from benign DNA mutations
Improve accuracy in germline mutation classification
Enable interpretable clinical diagnostics for personalized medicine
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-species pre-training on disease-relevant genomes
Germline mutation specialization via ClinVar fine-tuning
Interpretable clinical framework with DNA sequence embeddings
🔎 Similar Papers
No similar papers found.
Z
Zekai Lin
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
H
Haoran Sun
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
Yucheng Guo
Yucheng Guo
Princeton University
Stochastic AnalysisPartial Differential EquationsMathematical Finance
Y
Yujie Yang
BioMap Research, Beijing 100086, China
Y
Yanwen Wang
School of Basic Medical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China; Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
Bozhen Hu
Bozhen Hu
PhD, Zhejiang University & Westlake University
Graph Neural NetworkProtein Representation
C
Chonghang Ye
BioMap Research, Beijing 100086, China
Q
Qirong Yang
BioMap Research, Beijing 100086, China
F
Fan Zhong
Intelligent Medicine Institute, Fudan University, Shanghai 200032, China
X
Xiaoming Zhang
BioMap Research, Beijing 100086, China
L
Lei Liu
Intelligent Medicine Institute, Fudan University, Shanghai 200032, China; Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai 200032, China; Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai 200120, China