🤖 AI Summary
Early prediction of cardiovascular disease (CVD) is critical for reducing global morbidity and mortality, yet existing methods fail to effectively model complex, high-dimensional relationships among clinical tabular features. To address this, we propose CardioTabNet—a hybrid framework integrating Tab Transformer with classical machine learning. It is the first to employ Tab Transformer for clinical feature representation and automatic feature ordering; incorporates random forest–guided feature selection; and integrates an ExtraTree classifier optimized via hyperparameter tuning. Furthermore, a nomogram is derived to provide clinically interpretable risk quantification. Evaluated on a public dataset of 1,190 patients, CardioTabNet achieves a mean accuracy of 94.1% and a mean AUC of 95.0%, significantly outperforming state-of-the-art methods. The framework establishes a new paradigm for CVD early screening—characterized by high predictive accuracy, clinical interpretability, and deployability.
📝 Abstract
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.