Performance Analysis of Machine Learning Algorithms in Chronic Kidney Disease Prediction

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of early risk prediction for chronic kidney disease (CKD) by systematically evaluating eight machine learning algorithms on the publicly available UCL clinical dataset. To handle missing data and class imbalance—key obstacles in clinical prediction—the authors propose a novel hybrid imputation strategy combining mean and mode imputation, coupled with random undersampling. Experimental results demonstrate that Random Forest and Logistic Regression achieve identical classification accuracy of 99%, substantially outperforming k-Nearest Neighbors (73%); ensemble methods—including XGBoost and AdaBoost—also exhibit robust performance. Critically, the findings indicate that well-engineered linear models can match the predictive accuracy of sophisticated ensemble approaches, while offering superior interpretability and lower computational overhead. This supports the deployment of high-accuracy, transparent, and resource-efficient CKD screening tools in primary care settings.

Technology Category

Application Category

📝 Abstract
Kidneys are the filter of the human body. About 10% of the global population is thought to be affected by Chronic Kidney Disease (CKD), which causes kidney function to decline. To protect in danger patients from additional kidney damage, effective risk evaluation of CKD and appropriate CKD monitoring are crucial. Due to quick and precise detection capabilities, Machine Learning models can help practitioners accomplish this goal efficiently; therefore, an enormous number of diagnosis systems and processes in the healthcare sector nowadays are relying on machine learning due to its disease prediction capability. In this study, we designed and suggested disease predictive computer-aided designs for the diagnosis of CKD. The dataset for CKD is attained from the repository of machine learning of UCL, with a few missing values; those are filled in using "mean-mode" and "Random sampling method" strategies. After successfully achieving the missing data, eight ML techniques (Random Forest, SVM, Naive Bayes, Logistic Regression, KNN, XGBoost, Decision Tree, and AdaBoost) were used to establish models, and the performance evaluation comparisons among the result accuracies are measured by the techniques to find the machine learning models with the highest accuracy. Among them, Random Forest as well as Logistic Regression showed an outstanding 99% accuracy, followed by the Ada Boost, XGBoost, Naive Bayes, Decision Tree, and SVM, whereas the KNN classifier model stands last with an accuracy of 73%.
Problem

Research questions and friction points this paper is trying to address.

Evaluating machine learning algorithms for chronic kidney disease prediction accuracy
Comparing performance of eight ML models using UCL dataset with imputed values
Identifying Random Forest and Logistic Regression as top-performing CKD diagnostic models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used mean-mode and random sampling for missing data
Compared eight machine learning algorithms for CKD prediction
Achieved 99% accuracy with Random Forest and Logistic Regression
🔎 Similar Papers
No similar papers found.
Iftekhar Ahmed
Iftekhar Ahmed
Associate Professor, University of California, Irvine
Software EngineeringSoftware TestingMachine Learning
T
Tanzil Ebad Chowdhury
Department of CSE, Leading University, Sylhet, Bangladesh
B
Biggo Bushon Routh
Department of CSE, Leading University, Sylhet, Bangladesh
N
Nafisa Tasmiya
Department of CSE, Leading University, Sylhet, Bangladesh
S
Shadman Sakib
Department of CSE, Leading University, Sylhet, Bangladesh
Adil Ahmed Chowdhury
Adil Ahmed Chowdhury
MS Candidate, Department of CS, Bishop's University
Machine LearningDeep LearningSpeech SynthesisVoice CloningTransfer Learning