Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of predicting Bovine Viral Diarrhea (BVD) re-emergence in Irish cattle herds, where herd-level surveillance data exhibit extreme class imbalance (positive rate <1%). To meet the need for accurate early warning, we propose a livestock-specific machine learning modeling framework and systematically evaluate the generalizability of Random Forest, XGBoost, and anomaly detection methods on real-world veterinary monitoring data. We introduce a customized evaluation framework balancing clinical sensitivity and screening cost—incorporating PPV, F1-score, AUC, and a business-driven recall constraint. Results demonstrate that the optimized Random Forest model achieved 97.6% sensitivity (detecting 219 of 250 confirmed positive herds) and the highest AUC in 2023 field validation. Critically, it reduced the number of herds requiring targeted surveillance by 50% compared to universal screening, substantially enhancing the efficiency of disease control resource allocation.

Technology Category

Application Category

📝 Abstract
Bovine Viral Diarrhoea (BVD) has been the focus of a successful eradication programme in Ireland, with the herd-level prevalence declining from 11.3% in 2013 to just 0.2% in 2023. As the country moves toward BVD freedom, the development of predictive models for targeted surveillance becomes increasingly important to mitigate the risk of disease re-emergence. In this study, we evaluate the performance of a range of machine learning algorithms, including binary classification and anomaly detection techniques, for predicting BVD-positive herds using highly imbalanced herd-level data. We conduct an extensive simulation study to assess model performance across varying sample sizes and class imbalance ratios, incorporating resampling, class weighting, and appropriate evaluation metrics (sensitivity, positive predictive value, F1-score and AUC values). Random forests and XGBoost models consistently outperformed other methods, with the random forest model achieving the highest sensitivity and AUC across scenarios, including real-world prediction of 2023 herd status, correctly identifying 219 of 250 positive herds while halving the number of herds that require compared to a blanket-testing strategy.
Problem

Research questions and friction points this paper is trying to address.

Predict BVD re-emergence in Irish cattle herds
Handle highly imbalanced herd-level data effectively
Compare machine learning models for optimal surveillance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning for imbalanced data prediction
Random forests and XGBoost outperform others
Resampling and class weighting improve sensitivity
🔎 Similar Papers
2024-04-17Annual International Conference of the IEEE Engineering in Medicine and Biology SocietyCitations: 0