Evaluation metrics for temporal preservation in synthetic longitudinal patient data

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical gap in evaluating the temporal fidelity of synthetic longitudinal patient data, which often lacks rigorous assessment of how well generated records capture genuine patient dynamics over time. To this end, the authors propose the first multidimensional evaluation framework that systematically assesses temporal structure preservation at both individual and population levels across four key dimensions: marginal distributions, covariance structures, individual trajectories, and measurement patterns. By integrating statistical metrics with tailored data preprocessing strategies—such as binning, encoding, and precision control—the framework reveals that reliance solely on marginal distributions can obscure distortions in temporal dependencies. Empirical results demonstrate that this comprehensive set of metrics enables a more holistic evaluation of synthetic data quality, thereby guiding the refinement of generative models and substantially enhancing the temporal realism of longitudinal patient data.

Technology Category

Application Category

📝 Abstract
This study introduces a set of metrics for evaluating temporal preservation in synthetic longitudinal patient data, defined as artificially generated data that mimic real patients'repeated measurements over time. The proposed metrics assess how synthetic data reproduces key temporal characteristics, categorized into marginal, covariance, individual-level and measurement structures. We show that strong marginal-level resemblance may conceal distortions in covariance and disruptions in individual-level trajectories. Temporal preservation is influenced by factors such as original data quality, measurement frequency, and preprocessing strategies, including binning, variable encoding and precision. Variables with sparse or highly irregular measurement times provide limited information for learning temporal dependencies, resulting in reduced resemblance between the synthetic and original data. No single metric adequately captures temporal preservation; instead, a multidimensional evaluation across all characteristics provides a more comprehensive assessment of synthetic data quality. Overall, the proposed metrics clarify how and why temporal structures are preserved or degraded, enabling more reliable evaluation and improvement of generative models and supporting the creation of temporally realistic synthetic longitudinal patient data.
Problem

Research questions and friction points this paper is trying to address.

temporal preservation
synthetic longitudinal data
evaluation metrics
patient data
time-series fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal preservation
synthetic longitudinal data
evaluation metrics
generative models
patient trajectories
🔎 Similar Papers
No similar papers found.