Frugal, Flexible, Faithful: Causal Data Simulation via Frengression

๐Ÿ“… 2025-08-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Robust benchmarking of causal inference methods on real-world data has long been hindered by the scarcity of high-fidelity, controllable interventional simulation datasets. To address this, we propose Frengressionโ€”a novel framework that integrates parsimonious parameterization with deep generative modeling to directly learn the joint distribution over covariates, interventions, and outcomes. This enables precise, consistent, and extrapolation-guaranteed estimation of causal marginals. Frengression supports high-fidelity generation of multivariate time-series data and permits direct sampling under arbitrary intervention distributions, substantially improving simulation flexibility and controllability. Empirical evaluation on real clinical trial data demonstrates that Frengression-synthesized data yield accurate causal effect estimates. The framework thus significantly enhances the practicality, scalability, and reproducibility of causal simulation studies.

Technology Category

Application Category

๐Ÿ“ Abstract
Machine learning has revitalized causal inference by combining flexible models and principled estimators, yet robust benchmarking and evaluation remain challenging with real-world data. In this work, we introduce frengression, a deep generative realization of the frugal parameterization that models the joint distribution of covariates, treatments and outcomes around the causal margin of interest. Frengression provides accurate estimation and flexible, faithful simulation of multivariate, time-varying data; it also enables direct sampling from user-specified interventional distributions. Model consistency and extrapolation guarantees are established, with validation on real-world clinical trial data demonstrating frengression's practical utility. We envision this framework sparking new research into generative approaches for causal margin modelling.
Problem

Research questions and friction points this paper is trying to address.

Robust benchmarking and evaluation in causal inference with real-world data
Flexible and faithful simulation of multivariate, time-varying data
Direct sampling from user-specified interventional distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep generative model for causal data simulation
Flexible faithful multivariate time-varying data simulation
Direct sampling from interventional distributions
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Linying Yang
Department of Statistics, University of Oxford
R
Robin J. Evans
Department of Statistics, University of Oxford; Pioneer Centre for SMARTbiomed, University of Oxford
Xinwei Shen
Xinwei Shen
University of Washington
StatisticsMachine Learning