🤖 AI Summary
This work proposes an ensemble learning framework that jointly optimizes performance, interpretability, and cross-regional fairness to address three key challenges in anomaly detection for distributed power plants: extreme class imbalance, model opacity, and lack of regional fairness. The framework mitigates data imbalance via SMOTE-Tomek/ENN resampling, provides feature-level interpretability using SHAP values, and incorporates Disparate Impact Ratio (DIR) and Maximum Mean Discrepancy (MMD) to evaluate regional fairness and cross-domain generalization, respectively. Evaluated on diesel generator data, the LightGBM/XGBoost-based approach achieves an F1-score of 0.99 and a DIR of approximately 0.95, identifying fuel consumption rate and daily operating duration as critical predictive features. The solution further supports low-latency, containerized deployment for real-time inference.
📝 Abstract
Reliable anomaly detection in distributed power plant monitoring systems is essential for ensuring operational continuity and reducing maintenance costs, particularly in regions where telecom operators heavily rely on diesel generators. However, this task is challenged by extreme class imbalance, lack of interpretability, and potential fairness issues across regional clusters. In this work, we propose a supervised ML framework that integrates ensemble methods (LightGBM, XGBoost, Random Forest, CatBoost, GBDT, AdaBoost) and baseline models (Support Vector Machine, K-Nearrest Neighbors, Multilayer Perceptrons, and Logistic Regression) with advanced resampling techniques (SMOTE with Tomek Links and ENN) to address imbalance in a dataset of diesel generator operations in Cameroon. Interpretability is achieved through SHAP (SHapley Additive exPlanations), while fairness is quantified using the Disparate Impact Ratio (DIR) across operational clusters. We further evaluate model generalization using Maximum Mean Discrepancy (MMD) to capture domain shifts between regions. Experimental results show that ensemble models consistently outperform baselines, with LightGBM achieving an F1-score of 0.99 and minimal bias across clusters (DIR $\approx 0.95$). SHAP analysis highlights fuel consumption rate and runtime per day as dominant predictors, providing actionable insights for operators. Our findings demonstrate that it is possible to balance performance, interpretability, and fairness in anomaly detection, paving the way for more equitable and explainable AI systems in industrial power management. {\color{black} Finally, beyond offline evaluation, we also discuss how the trained models can be deployed in practice for real-time monitoring. We show how containerized services can process in real-time, deliver low-latency predictions, and provide interpretable outputs for operators.