🤖 AI Summary
This study addresses the challenge of early risk prediction for chronic kidney disease (CKD) by systematically evaluating eight machine learning algorithms on the publicly available UCL clinical dataset. To handle missing data and class imbalance—key obstacles in clinical prediction—the authors propose a novel hybrid imputation strategy combining mean and mode imputation, coupled with random undersampling. Experimental results demonstrate that Random Forest and Logistic Regression achieve identical classification accuracy of 99%, substantially outperforming k-Nearest Neighbors (73%); ensemble methods—including XGBoost and AdaBoost—also exhibit robust performance. Critically, the findings indicate that well-engineered linear models can match the predictive accuracy of sophisticated ensemble approaches, while offering superior interpretability and lower computational overhead. This supports the deployment of high-accuracy, transparent, and resource-efficient CKD screening tools in primary care settings.
📝 Abstract
Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.