Predictive Modeling and Explainable AI for Veterinary Safety Profiles, Residue Assessment, and Health Outcomes Using Real-World Data and Physicochemical Properties

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of safety assessment for veterinary drug use in food-producing animals. We propose a multimodal predictive framework integrating real-world veterinary clinical data with drug physicochemical and pharmacokinetic properties to simultaneously forecast post-treatment mortality/recovery outcomes and tissue residue risk. To mitigate class imbalance—particularly for rare fatal events—we innovatively introduce Area-Under-Mortality (AUM) pseudo-labeling, combined with VeDDRA ontology-based standardization, synthetic oversampling, and ensemble learning. CatBoost serves as the primary classifier, augmented by SHAP for biologically interpretable feature attribution, identifying key physicochemical and PK drivers of lethality. Evaluated on a large-scale real-world veterinary dataset, the model achieves 0.95 accuracy, recall, and F1-score, significantly enhancing early detection of fatal outcomes. The framework delivers an interpretable, robust decision-support tool for animal welfare protection, human food safety regulation, and residue risk management.

Technology Category

Application Category

📝 Abstract
The safe use of pharmaceuticals in food-producing animals is vital to protect animal welfare and human food safety. Adverse events (AEs) may signal unexpected pharmacokinetic or toxicokinetic effects, increasing the risk of violative residues in the food chain. This study introduces a predictive framework for classifying outcomes (Death vs. Recovery) using ~1.28 million reports (1987-2025 Q1) from the U.S. FDA's OpenFDA Center for Veterinary Medicine. A preprocessing pipeline merged relational tables and standardized AEs through VeDDRA ontologies. Data were normalized, missing values imputed, and high-cardinality features reduced; physicochemical drug properties were integrated to capture chemical-residue links. We evaluated supervised models, including Random Forest, CatBoost, XGBoost, ExcelFormer, and large language models (Gemma 3-27B, Phi 3-12B). Class imbalance was addressed, such as undersampling and oversampling, with a focus on prioritizing recall for fatal outcomes. Ensemble methods(Voting, Stacking) and CatBoost performed best, achieving precision, recall, and F1-scores of 0.95. Incorporating Average Uncertainty Margin (AUM)-based pseudo-labeling of uncertain cases improved minority-class detection, particularly in ExcelFormer and XGBoost. Interpretability via SHAP identified biologically plausible predictors, including lung, heart, and bronchial disorders, animal demographics, and drug physicochemical properties. These features were strongly linked to fatal outcomes. Overall, the framework shows that combining rigorous data engineering, advanced machine learning, and explainable AI enables accurate, interpretable predictions of veterinary safety outcomes. The approach supports FARAD's mission by enabling early detection of high-risk drug-event profiles, strengthening residue risk assessment, and informing regulatory and clinical decision-making.
Problem

Research questions and friction points this paper is trying to address.

Predicting veterinary drug safety outcomes using real-world data and machine learning
Identifying drug residue risks through physicochemical properties and adverse events
Developing explainable AI framework for animal health and food safety assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predicting veterinary outcomes using machine learning models
Integrating drug physicochemical properties with real-world data
Explaining predictions via SHAP for biological interpretability
🔎 Similar Papers
No similar papers found.
H
Hossein Sholehrasa
1DATA Consortium and FARAD Program, Kansas State University, Olathe, KS, USA
X
Xuan Xu
Department of Statistics, Kansas State University, Manhattan, KS, USA
Doina Caragea
Doina Caragea
Kansas State University
deep learningtext miningdata miningdata science
J
Jim E. Riviere
1DATA Consortium and FARAD Program, Kansas State University, Olathe, KS, USA
Majid Jaberi-Douraki
Majid Jaberi-Douraki
Kansas State University
Mathematical BiologyBig DataData ScienceOne Health1DATA