🤖 AI Summary
This study addresses the challenges of zoonotic virus host溯源 and delayed early-warning capability by proposing a lightweight bidirectional long short-term memory (BiLSTM) model capable of cross-species host prediction using only 400 bp viral nucleotide sequences. We innovatively constructed three high-quality, multi-class viral host datasets—covering hantaviruses, rotavirus A, and rabies virus—and rigorously evaluated performance using confusion matrices, F1-score, precision, recall, and micro-averaged AUC. The model achieved accuracies of 89.62%, 96.58%, and 77.22% on these three virus groups, respectively—substantially outperforming existing methods. Its lightweight architecture ensures high accuracy while maintaining practical deployability. This work provides a scalable, computationally efficient tool for early risk assessment of zoonotic diseases.
📝 Abstract
Recorded history shows the long coexistence of humans and animals, suggesting it began much earlier. Despite some beneficial interdependence, many animals carry viral diseases that can spread to humans. These diseases are known as zoonotic diseases. Recent outbreaks of SARS-CoV-2, Monkeypox and swine flu viruses have shown how these viruses can disrupt human life and cause death. Fast and accurate predictions of the host from which the virus spreads can help prevent these diseases from spreading. This work presents BiLSTM-VHP, a lightweight bidirectional long short-term memory (LSTM)-based architecture that can predict the host from the nucleotide sequence of orthohantavirus, rabies lyssavirus, and rotavirus A with high accuracy. The proposed model works with nucleotide sequences of 400 bases in length and achieved a prediction accuracy of 89.62% for orthohantavirus, 96.58% for rotavirus A, and 77.22% for rabies lyssavirus outperforming previous studies. Moreover, performance of the model is assessed using the confusion matrix, F-1 score, precision, recall, microaverage AUC. In addition, we introduce three curated datasets of orthohantavirus, rotavirus A, and rabies lyssavirus containing 8,575, 95,197, and 22,052 nucleotide sequences divided into 9, 12, and 29 host classes, respectively. The codes and dataset are available at https://doi.org/10.17605/OSF.IO/ANFKR