Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study addresses the challenge of early risk prediction for chronic kidney disease (CKD) by systematically evaluating eight machine learning algorithms on the publicly available UCL clinical dataset. To handle missing data and class imbalance—key obstacles in clinical prediction—the authors propose a novel hybrid imputation strategy combining mean and mode imputation, coupled with random undersampling. Experimental results demonstrate that Random Forest and Logistic Regression achieve identical classification accuracy of 99%, substantially outperforming k-Nearest Neighbors (73%); ensemble methods—including XGBoost and AdaBoost—also exhibit robust performance. Critically, the findings indicate that well-engineered linear models can match the predictive accuracy of sophisticated ensemble approaches, while offering superior interpretability and lower computational overhead. This supports the deployment of high-accuracy, transparent, and resource-efficient CKD screening tools in primary care settings.

Technology Category

Application Category

📝 Abstract

Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.

Problem

Research questions and friction points this paper is trying to address.

Evaluating machine learning algorithms for chronic kidney disease prediction accuracy

Comparing performance of eight ML models using UCL dataset with imputed values

Identifying Random Forest and Logistic Regression as top-performing CKD diagnostic models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used mean-mode and random sampling for missing data

Compared eight machine learning algorithms for CKD prediction

Achieved 99% accuracy with Random Forest and Logistic Regression

🔎 Similar Papers

AI-Driven Predictive Analytics Approach for Early Prognosis of Chronic Kidney Disease Using Ensemble Learning and Explainable AI