🤖 AI Summary
To address the challenges of water quality prediction and interpretable monitoring under small-sample, highly seasonal conditions in developing countries like Nepal, this study proposes a lightweight hybrid modeling framework. Methodologically, it introduces a novel multi-source feature fusion architecture integrating CNN-RNN and tree-based models (e.g., CatBoost, XGBoost); pioneers the application of LIME for local interpretability in water quality classification, enabling attribution of decisions to key pollution factors; and constructs the world’s first RAG-enhanced, water sustainability–oriented QA system for water quality. Experimental results show an RMSE of 1.2 (R² = 0.99) for WQI regression and classification accuracies of 99% (ensemble) and 92% (neural network, R² = 0.97). The system supports real-time forecasting, attribution visualization, and natural-language water quality querying, significantly enhancing intelligent water safety decision-making in resource-constrained settings.
📝 Abstract
Ensuring safe water supplies requires effective water quality monitoring, especially in developing countries like Nepal, where contamination risks are high. This paper introduces a hybrid deep learning model to predict Nepal's seasonal water quality using a small dataset with multiple water quality parameters. Models such as CatBoost, XGBoost, Extra Trees, and LightGBM, along with a neural network combining CNN and RNN layers, are used to capture temporal and spatial patterns in the data. The model demonstrated notable accuracy improvements, aiding proactive water quality control. CatBoost, XGBoost, and Extra Trees Regressor predicted Water Quality Index (WQI) values with an average RMSE of 1.2 and an R2 score of 0.99. Additionally, classifiers achieved 99 percent accuracy, cross-validated across models. LIME analysis highlighted the importance of indicators like EC and DO levels in XGBoost classification decisions. The neural network model achieved 92 percent classification accuracy and an R2 score of 0.97, with an RMSE of 2.87 in regression analysis. Furthermore, a multifunctional application was developed to predict WQI values using both regression and classification methods.