Deep and diverse population synthesis for multi-person households using generative models

πŸ“… 2025-08-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional population synthesis methods (e.g., Iterative Proportional Fitting) suffer from degraded fidelity in high-dimensional settings and fail to capture structured intra-household dependencies; existing deep learning approaches lack controllability and explicit modeling of household-level correlations. To address these limitations, we propose the Conditional Input Directed Acyclic Table Generative Adversarial Network (ciDATGAN)β€”the first table-GAN framework integrating conditional generation with directed acyclic graph (DAG)-structured embeddings to explicitly model heterogeneous, asymmetric dependencies among household members. Leveraging data augmentation and multi-stage optimization, ciDATGAN synthesizes a high-fidelity population dataset comprising 20 million individuals across 7.5 million households. The synthetic data strictly preserves marginal distributions aligned with census statistics, achieves 17% higher diversity than U.S. Public Use Microdata Samples (PUMS) and 13% over the Popgen baseline, and significantly improves representational capacity and fairness in urban and transportation simulation tasks.

Technology Category

Application Category

πŸ“ Abstract
Synthetic population is an increasingly important material used in numerous areas such as urban and transportation analysis. Traditional methods such as iterative proportional fitting (IPF) is not capable of generating high-quality data when facing datasets with high dimension. Latest population synthesis methods using deep learning techniques can resolve such curse of dimensionality. However, few controls are placed when using these methods, and few of the methods are used to generate synthetic population capturing associations among members in one household. In this study, we propose a framework that tackles these issues. The framework uses a novel population synthesis model, called conditional input directed acyclic tabular generative adversarial network (ciDATGAN), as its core, and a basket of methods are employed to enhance the population synthesis performance. We apply the model to generate a synthetic population for the whole New York State as a public resource for researchers and policymakers. The synthetic population includes nearly 20 million individuals and 7.5 million households. The marginals obtained from the synthetic population match the census marginals well while maintaining similar associations among household members to the sample. Compared to the PUMS data, the synthetic population provides data that is 17% more diverse; when compared against a benchmark approach based on Popgen, the proposed method is 13% more diverse. This study provides an approach that encompasses multiple methods to enhance the population synthesis procedure with greater equity- and diversity-awareness.
Problem

Research questions and friction points this paper is trying to address.

Generating high-dimensional synthetic household population data
Capturing associations among household members accurately
Improving diversity and equity in population synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ciDATGAN for synthetic population generation
Enhances diversity with multi-method framework
Matches census data while preserving household associations
πŸ”Ž Similar Papers
No similar papers found.
H
Hai Yang
C2SMARTER Center, Department of Civil & Urban Engineering, New York University Tandon School of Engineering
Hongying Wu
Hongying Wu
McGill University, New York University, University of Washington
Transit PlanningUrban PlanningTravel BehaviorRidershipEquity
L
Linfei Yuan
C2SMARTER Center, Department of Civil & Urban Engineering, New York University Tandon School of Engineering
Xiyuan Ren
Xiyuan Ren
New York University
Travel BehaviorShared MobilityUrban Data AnalyticsGreen Space Planning
Joseph Y. J. Chow
Joseph Y. J. Chow
New York University
Behavioral informaticsurban transportation systems
J
Jinqin Gao
C2SMARTER Center, Department of Civil & Urban Engineering, New York University Tandon School of Engineering
Kaan Ozbay
Kaan Ozbay
Professor of Civil and Urban Engineering, New York University
Intelligent Transportation SystemsConnected and Autonomous VehicleTraffic ControlTraffic SafetyTransportation Networks