🤖 AI Summary
This study addresses the challenges of delayed clinical diagnosis and class imbalance in early dementia prediction by proposing a supervised learning framework that integrates structured health records with unstructured clinical text. To mitigate data imbalance, the authors employ SMOTE oversampling, while textual features are processed using TF-IDF vectorization. The performance of several classifiers—including K-nearest neighbors (KNN), quadratic discriminant analysis (QDA), Gaussian processes, and linear discriminant analysis (LDA)—is systematically evaluated. Among these, the LDA model achieves 98% accuracy on the test set, substantially outperforming baseline approaches. Furthermore, interpretability analyses reveal strong associations between dementia risk and factors such as the APOE-ε4 allele and comorbid chronic conditions like diabetes, demonstrating the dual advantage of the proposed pipeline in both predictive performance and clinically meaningful insights.
📝 Abstract
Dementia is a complex syndrome impacting cognitive and emotional functions, with Alzheimer's disease being the most common form. This study focuses on enhancing dementia prediction using machine learning (ML) techniques on patient health data. Supervised learning algorithms are applied in this study, including K-Nearest Neighbors (KNN), Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), and Gaussian Process Classifiers. To address class imbalance and improve model performance, techniques such as Synthetic Minority Over-sampling Technique (SMOTE) and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization were employed. Among the models, LDA achieved the highest testing accuracy of 98%. This study highlights the importance of model interpretability and the correlation of dementia with features such as the presence of the APOE-epsilon4 allele and chronic conditions like diabetes. This research advocates for future ML innovations, particularly in integrating explainable AI approaches, to further improve predictive capabilities in dementia care.