Privately Fine-Tuned LLMs Preserve Temporal Dynamics in Tabular Data

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Existing differential privacy methods for longitudinal tabular data often compromise temporal consistency by flattening users’ historical records into high-dimensional vectors. This work proposes PATH, a novel framework that, for the first time, leverages large language models for privacy-preserving generation of longitudinal tabular data. PATH treats each user’s entire subtable as a single generation unit and employs differentially private fine-tuning combined with an autoregressive mechanism to directly model complete event sequences, thereby preserving long-range temporal dependencies. Experimental results demonstrate that, compared to state-of-the-art marginal-based mechanisms, PATH achieves over 60% reduction in distributional distance and nearly 50% fewer state transition errors while maintaining fidelity to marginal distributions, significantly enhancing temporal realism in synthetic data.

Technology Category

Application Category

📝 Abstract

Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This perspective neglects the temporal complexity in longitudinal datasets, such as electronic health records, where a user contributes an entire (sub) table of sequential events. While practitioners might attempt to model such data by flattening user histories into high-dimensional vectors for use with standard marginal-based mechanisms, we demonstrate that this strategy is insufficient. Flattening fails to preserve temporal coherence even when it maintains valid marginal distributions. We introduce PATH, a novel generative framework that treats the full table as the unit of synthesis and leverages the autoregressive capabilities of privately fine-tuned large language models. Extensive evaluations show that PATH effectively captures long-range dependencies that traditional methods miss. Empirically, our method reduces the distributional distance to real trajectories by over 60% and reduces state transition errors by nearly 50% compared to leading marginal mechanisms while achieving similar marginal fidelity.

Problem

Research questions and friction points this paper is trying to address.

differentially private synthetic data

temporal dynamics

longitudinal data

tabular data

time coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

differentially private synthetic data

temporal dynamics

large language models