Enabling Granular Subgroup Level Model Evaluations by Generating Synthetic Medical Time Series

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the challenge of trustworthy model evaluation—across both population-level and fine-grained subgroups (e.g., age × sex × race intersections)—under privacy constraints on ICU time-series data. We propose Enhanced TimeAutoDiff, a novel framework integrating latent-space diffusion modeling with distribution alignment regularization within a unified VAE-diffusion architecture, jointly optimizing synthetic data fidelity and statistical representativeness. Evaluated on MIMIC-III and eICU, our method reduces the TRTS (Train-on-Real, Test-on-Synthetic) performance gap by over 70% and decreases subgroup AUROC estimation error by up to 50%. Moreover, it outperforms evaluation using scarce real-data samples in 72%–84% of 32 subgroups. To our knowledge, this is the first approach enabling high-fidelity, privacy-preserving, and subgroup-generalizable evaluation for critical care AI models.

Technology Category

Application Category

📝 Abstract

We present a novel framework for leveraging synthetic ICU time-series data not only to train but also to rigorously and trustworthily evaluate predictive models, both at the population level and within fine-grained demographic subgroups. Building on prior diffusion and VAE-based generators (TimeDiff, HealthGen, TimeAutoDiff), we introduce extit{Enhanced TimeAutoDiff}, which augments the latent diffusion objective with distribution-alignment penalties. We extensively benchmark all models on MIMIC-III and eICU, on 24-hour mortality and binary length-of-stay tasks. Our results show that Enhanced TimeAutoDiff reduces the gap between real-on-synthetic and real-on-real evaluation (``TRTS gap'') by over 70%, achieving $Δ_{TRTS} leq 0.014$ AUROC, while preserving training utility ($Δ_{TSTR} approx 0.01$). Crucially, for 32 intersectional subgroups, large synthetic cohorts cut subgroup-level AUROC estimation error by up to 50% relative to small real test sets, and outperform them in 72--84% of subgroups. This work provides a practical, privacy-preserving roadmap for trustworthy, granular model evaluation in critical care, enabling robust and reliable performance analysis across diverse patient populations without exposing sensitive EHR data, contributing to the overall trustworthiness of Medical AI.

Problem

Research questions and friction points this paper is trying to address.

Enabling granular subgroup-level evaluation of medical predictive models

Generating synthetic ICU time-series data for privacy-preserving model assessment

Reducing performance estimation gaps between synthetic and real data evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced TimeAutoDiff uses distribution-alignment penalties in diffusion

Generates synthetic ICU time-series for subgroup model evaluation

Reduces TRTS gap by over 70% while preserving utility

🔎 Similar Papers

No similar papers found.