π€ AI Summary
Existing differential privacy methods for longitudinal tabular data often compromise temporal consistency by flattening usersβ historical records into high-dimensional vectors. This work proposes PATH, a novel framework that, for the first time, leverages large language models for privacy-preserving generation of longitudinal tabular data. PATH treats each userβs entire subtable as a single generation unit and employs differentially private fine-tuning combined with an autoregressive mechanism to directly model complete event sequences, thereby preserving long-range temporal dependencies. Experimental results demonstrate that, compared to state-of-the-art marginal-based mechanisms, PATH achieves over 60% reduction in distributional distance and nearly 50% fewer state transition errors while maintaining fidelity to marginal distributions, significantly enhancing temporal realism in synthetic data.
π Abstract
Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.