Feature selection strategies for optimized heart disease diagnosis using ML and DL models

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly optimizing model interpretability and performance in early cardiovascular and cerebrovascular disease diagnosis, this study systematically evaluates the impact of three feature selection methods—mutual information (MI), ANOVA, and chi-square test—on eleven machine learning and deep learning models, including neural networks, random forests, and logistic regression. Results reveal that MI significantly enhances performance of complex models: neural networks achieve 82.3% accuracy and 0.94 recall; logistic regression and random forests attain 82.1% and 80.99% accuracy, respectively. Lightweight models under ANOVA or chi-square selection still achieve competitive accuracy (75.99%–76.45%). Based on these findings, we propose a “feature selection–model alignment” principle, establishing empirically grounded guidelines for selecting clinically deployable AI models that balance high predictive performance with inherent interpretability.

Technology Category

Application Category

📝 Abstract
Heart disease remains one of the leading causes of morbidity and mortality worldwide, necessitating the development of effective diagnostic tools to enable early diagnosis and clinical decision-making. This study evaluates the impact of feature selection techniques Mutual Information (MI), Analysis of Variance (ANOVA), and Chi-Square on the predictive performance of various machine learning (ML) and deep learning (DL) models using a dataset of clinical indicators for heart disease. Eleven ML/DL models were assessed using metrics such as precision, recall, AUC score, F1-score, and accuracy. Results indicate that MI outperformed other methods, particularly for advanced models like neural networks, achieving the highest accuracy of 82.3% and recall score of 0.94. Logistic regression (accuracy 82.1%) and random forest (accuracy 80.99%) also demonstrated improved performance with MI. Simpler models such as Naive Bayes and decision trees achieved comparable results with ANOVA and Chi-Square, yielding accuracies of 76.45% and 75.99%, respectively, making them computationally efficient alternatives. Conversely, k Nearest Neighbors (KNN) and Support Vector Machines (SVM) exhibited lower performance, with accuracies ranging between 51.52% and 54.43%, regardless of the feature selection method. This study provides a comprehensive comparison of feature selection methods for heart disease prediction, demonstrating the critical role of feature selection in optimizing model performance. The results offer practical guidance for selecting appropriate feature selection techniques based on the chosen classification algorithm, contributing to the development of more accurate and efficient diagnostic tools for enhanced clinical decision-making in cardiology.
Problem

Research questions and friction points this paper is trying to address.

Evaluates feature selection impact on ML/DL models for heart disease diagnosis
Compares MI, ANOVA, Chi-Square to optimize model accuracy and recall
Guides algorithm-specific feature selection for improved clinical decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mutual Information for feature selection
Evaluates ML/DL models with clinical data
Achieves 82.3% accuracy with neural networks
🔎 Similar Papers
No similar papers found.
B
Bilal Ahmad
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
Jinfu Chen
Jinfu Chen
Wuhan university
Software performancesoftware log miningmining software repository
H
Haibao Chen
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China