Improving Cardiac Risk Prediction Using Data Generation Techniques

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Cardiac rehabilitation research is hindered by the scarcity, high missingness, and heterogeneity of real-world clinical data, limiting the performance of risk prediction models. To address this, we propose the first Conditional Variational Autoencoder (CVAE) framework tailored for cardiac rehabilitation process modeling, integrating clinical temporal feature representation with a missingness-aware enhancement mechanism to generate high-fidelity synthetic data—pathologically coherent, temporally plausible, and compatible across heterogeneous sources. Our method significantly improves downstream model robustness under low-data and high-missingness regimes: multiple risk classifiers achieve an average accuracy gain of 7.2%, outperforming state-of-the-art generative approaches. Moreover, it mitigates dataset bias and enables reliable risk stratification without requiring exercise stress testing.

Technology Category

Application Category

📝 Abstract

Cardiac rehabilitation constitutes a structured clinical process involving multiple interdependent phases, individualized medical decisions, and the coordinated participation of diverse healthcare professionals. This sequential and adaptive nature enables the program to be modeled as a business process, thereby facilitating its analysis. Nevertheless, studies in this context face significant limitations inherent to real-world medical databases: data are often scarce due to both economic costs and the time required for collection; many existing records are not suitable for specific analytical purposes; and, finally, there is a high prevalence of missing values, as not all patients undergo the same diagnostic tests. To address these limitations, this work proposes an architecture based on a Conditional Variational Autoencoder (CVAE) for the synthesis of realistic clinical records that are coherent with real-world observations. The primary objective is to increase the size and diversity of the available datasets in order to enhance the performance of cardiac risk prediction models and to reduce the need for potentially hazardous diagnostic procedures, such as exercise stress testing. The results demonstrate that the proposed architecture is capable of generating coherent and realistic synthetic data, whose use improves the accuracy of the various classifiers employed for cardiac risk detection, outperforming state-of-the-art deep learning approaches for synthetic data generation.

Problem

Research questions and friction points this paper is trying to address.

Generates synthetic clinical data to overcome data scarcity

Enhances cardiac risk prediction model performance using generated data

Reduces reliance on hazardous diagnostic procedures through data augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

CVAE generates realistic synthetic clinical records

Synthetic data improves cardiac risk prediction accuracy

Reduces need for hazardous diagnostic procedures

🔎 Similar Papers

No similar papers found.