Improving Cardiac Risk Prediction Using Data Generation Techniques

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cardiac rehabilitation research is hindered by the scarcity, high missingness, and heterogeneity of real-world clinical data, limiting the performance of risk prediction models. To address this, we propose the first Conditional Variational Autoencoder (CVAE) framework tailored for cardiac rehabilitation process modeling, integrating clinical temporal feature representation with a missingness-aware enhancement mechanism to generate high-fidelity synthetic data—pathologically coherent, temporally plausible, and compatible across heterogeneous sources. Our method significantly improves downstream model robustness under low-data and high-missingness regimes: multiple risk classifiers achieve an average accuracy gain of 7.2%, outperforming state-of-the-art generative approaches. Moreover, it mitigates dataset bias and enables reliable risk stratification without requiring exercise stress testing.

Technology Category

Application Category

📝 Abstract
Cardiac rehabilitation constitutes a structured clinical process involving multiple interdependent phases, individualized medical decisions, and the coordinated participation of diverse healthcare professionals. This sequential and adaptive nature enables the program to be modeled as a business process, thereby facilitating its analysis. Nevertheless, studies in this context face significant limitations inherent to real-world medical databases: data are often scarce due to both economic costs and the time required for collection; many existing records are not suitable for specific analytical purposes; and, finally, there is a high prevalence of missing values, as not all patients undergo the same diagnostic tests. To address these limitations, this work proposes an architecture based on a Conditional Variational Autoencoder (CVAE) for the synthesis of realistic clinical records that are coherent with real-world observations. The primary objective is to increase the size and diversity of the available datasets in order to enhance the performance of cardiac risk prediction models and to reduce the need for potentially hazardous diagnostic procedures, such as exercise stress testing. The results demonstrate that the proposed architecture is capable of generating coherent and realistic synthetic data, whose use improves the accuracy of the various classifiers employed for cardiac risk detection, outperforming state-of-the-art deep learning approaches for synthetic data generation.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic clinical data to overcome data scarcity
Enhances cardiac risk prediction model performance using generated data
Reduces reliance on hazardous diagnostic procedures through data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

CVAE generates realistic synthetic clinical records
Synthetic data improves cardiac risk prediction accuracy
Reduces need for hazardous diagnostic procedures
🔎 Similar Papers
No similar papers found.
A
Alexandre Cabodevila
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, SPAIN
P
Pedro Gamallo-Fernández
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, SPAIN
J
Juan C. Vidal
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, SPAIN
Manuel Lama
Manuel Lama
CiTIUS, University of Santiago de Compostela
Process MiningPredictionBusiness Process ManagementOntologies