BiLSTM-VHP: BiLSTM-Powered Network for Viral Host Prediction

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of zoonotic virus host溯源 and delayed early-warning capability by proposing a lightweight bidirectional long short-term memory (BiLSTM) model capable of cross-species host prediction using only 400 bp viral nucleotide sequences. We innovatively constructed three high-quality, multi-class viral host datasets—covering hantaviruses, rotavirus A, and rabies virus—and rigorously evaluated performance using confusion matrices, F1-score, precision, recall, and micro-averaged AUC. The model achieved accuracies of 89.62%, 96.58%, and 77.22% on these three virus groups, respectively—substantially outperforming existing methods. Its lightweight architecture ensures high accuracy while maintaining practical deployability. This work provides a scalable, computationally efficient tool for early risk assessment of zoonotic diseases.

Technology Category

Application Category

📝 Abstract
Recorded history shows the long coexistence of humans and animals, suggesting it began much earlier. Despite some beneficial interdependence, many animals carry viral diseases that can spread to humans. These diseases are known as zoonotic diseases. Recent outbreaks of SARS-CoV-2, Monkeypox and swine flu viruses have shown how these viruses can disrupt human life and cause death. Fast and accurate predictions of the host from which the virus spreads can help prevent these diseases from spreading. This work presents BiLSTM-VHP, a lightweight bidirectional long short-term memory (LSTM)-based architecture that can predict the host from the nucleotide sequence of orthohantavirus, rabies lyssavirus, and rotavirus A with high accuracy. The proposed model works with nucleotide sequences of 400 bases in length and achieved a prediction accuracy of 89.62% for orthohantavirus, 96.58% for rotavirus A, and 77.22% for rabies lyssavirus outperforming previous studies. Moreover, performance of the model is assessed using the confusion matrix, F-1 score, precision, recall, microaverage AUC. In addition, we introduce three curated datasets of orthohantavirus, rotavirus A, and rabies lyssavirus containing 8,575, 95,197, and 22,052 nucleotide sequences divided into 9, 12, and 29 host classes, respectively. The codes and dataset are available at https://doi.org/10.17605/OSF.IO/ANFKR
Problem

Research questions and friction points this paper is trying to address.

Predicting viral host origins from nucleotide sequences
Developing BiLSTM model for zoonotic disease prevention
Accurately identifying hosts for orthohantavirus, rabies, rotavirus A
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional LSTM architecture for host prediction
Uses 400-base nucleotide sequences as input
Achieves high accuracy across three virus types
🔎 Similar Papers
No similar papers found.
A
Azher Ahmed Efat
Department of Computer Science, Iowa State University, Ames, IA, 50010, USA
A
Azher Ahmed Efat
Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
Farzana Islam
Farzana Islam
Biotechnology Program, Department of Mathematics and Natural Sciences, Brac University, Dhaka, Bangladesh
Annajiat Alim Rasel
Annajiat Alim Rasel
Department of Computer Science and Engineering, School of Data and Sciences, BRAC University
Natural Language ProcessingDistributed SystemsHigh Performance ComputingArtificial IntelligenceInformation Security
M
Munima Haque
Biotechnology Program, Department of Mathematics and Natural Sciences, Brac University, Dhaka, Bangladesh