PETRA: Pretrained Evolutionary Transformer for SARS-CoV-2 Mutation Prediction

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
SARS-CoV-2’s continuous evolution and immune-escape mutations pose significant challenges to vaccine design and public health response. To address key bottlenecks—including high noise in raw RNA sequences and spatiotemporally biased global sequencing data—this work introduces the first pre-trained Transformer model that takes phylogenetic tree evolutionary trajectories as input, abandoning direct modeling of raw sequences. Instead, it encodes branch paths and evolutionary distances, and incorporates a weighted loss framework to mitigate sampling bias. By integrating evolutionary biological priors with deep learning, our method achieves weighted recall scores of 9.45% (nucleotide-level) and 17.10% (spike protein amino acid-level) for mutation prediction—substantially outperforming baseline approaches. Furthermore, it enables real-time tracking of dominant variants and delivers an interpretable, deployable computational tool for prospective vaccine strain updates.

Technology Category

Application Category

📝 Abstract
Since its emergence, SARS-CoV-2 has demonstrated a rapid and unpredictable evolutionary trajectory, characterized by the continual emergence of immune-evasive variants. This poses persistent challenges to public health and vaccine development. While large-scale generative pre-trained transformers (GPTs) have revolutionized the modeling of sequential data, their direct applications to noisy viral genomic sequences are limited. In this paper, we introduce PETRA(Pretrained Evolutionary TRAnsformer), a novel transformer approach based on evolutionary trajectories derived from phylogenetic trees rather than raw RNA sequences. This method effectively mitigates sequencing noise and captures the hierarchical structure of viral evolution. With a weighted training framework to address substantial geographical and temporal imbalances in global sequence data, PETRA excels in predicting future SARS-CoV-2 mutations, achieving a weighted recall@1 of 9.45% for nucleotide mutations and 17.10% for spike amino-acid mutations, compared to 0.49% and 6.64% respectively for the best baseline. PETRA also demonstrates its ability to aid in the real-time mutation prediction of major clades like 24F(XEC) and 25A(LP.8.1). The code is open sourced on https://github.com/xz-keg/PETra
Problem

Research questions and friction points this paper is trying to address.

Predicting SARS-CoV-2 mutations using evolutionary trajectories from phylogenetic trees
Addressing sequencing noise and hierarchical viral evolution structure
Overcoming geographical and temporal imbalances in global sequence data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses evolutionary trajectories from phylogenetic trees
Employs weighted training for data imbalance
Predicts SARS-CoV-2 mutations with transformer architecture
🔎 Similar Papers
No similar papers found.