Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address privacy preservation, cross-institutional scalability, and multi-task generalization challenges in training generative foundation models on distributed electronic health records (EHRs), this paper proposes the Federated Timeline Synthesis (FTS) framework. FTS models patient histories as tokenized patient health timelines (PHTs) and enables decentralized training via local autoregressive Transformer learning coordinated with a global generator—requiring only model weight uploads. Innovatively integrating time-series tokenization, federated learning, and Monte Carlo simulation, FTS supports zero-shot inference, counterfactual reasoning, and early clinical warning. Evaluated on MIMIC-IV, models trained on FTS-synthesized data achieve performance comparable to those trained on real data across five clinical prediction tasks. This validates FTS’s strong privacy guarantees (no raw data sharing), effective longitudinal modeling, and robust scalability across heterogeneous healthcare institutions.

Technology Category

Application Category

📝 Abstract

We present Federated Timeline Synthesis (FTS), a novel framework for training generative foundation models across distributed timeseries data applied to electronic health records (EHR). At its core, FTS represents patient history as tokenized Patient Health Timelines (PHTs), language-agnostic sequences encoding temporal, categorical, and continuous clinical information. Each institution trains an autoregressive transformer on its local PHTs and transmits only model weights to a central server. The server uses the generators to synthesize a large corpus of trajectories and train a Global Generator (GG), enabling zero-shot inference via Monte Carlo simulation of future PHTs. We evaluate FTS on five clinically meaningful prediction tasks using MIMIC-IV data, showing that models trained on synthetic data generated by GG perform comparably to those trained on real data. FTS offers strong privacy guarantees, scalability across institutions, and extensibility to diverse prediction and simulation tasks especially in healthcare, including counterfactual inference, early warning detection, and synthetic trial design.

Problem

Research questions and friction points this paper is trying to address.

Training generative models on distributed timeseries EHR data

Ensuring privacy and scalability in federated learning

Enabling zero-shot inference for clinical prediction tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenized Patient Health Timelines for data encoding

Federated training with local model weight sharing

Global Generator for synthetic data and zero-shot inference

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions