SemaPop: Semantic-Persona Conditioned Population Synthesis

๐Ÿ“… 2026-02-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing synthetic population generation methods struggle to simultaneously ensure statistical consistency and enable controllable generation with behavioral semantics. To address this challenge, this work proposes SemaPop, which, for the first time, integrates high-level personality representations extracted by large language models as semantic conditioning into a WGAN-GPโ€“based generative framework, augmented with marginal regularization constraints. By jointly modeling abstract behavioral patterns and multidimensional statistical structures, SemaPop achieves semantically guided yet statistically consistent population synthesis. The approach significantly improves the fidelity of generated populations to real-world data in both marginal and joint distributions, while preserving sample diversity and feasibility. Consequently, it enhances the controllability and interpretability of synthetic populations without compromising their realism or structural integrity.

Technology Category

Application Category

๐Ÿ“ Abstract
Population synthesis is a critical component of individual-level socio-economic simulation, yet remains challenging due to the need to jointly represent statistical structure and latent behavioral semantics. Existing population synthesis approaches predominantly rely on structured attributes and statistical constraints, leaving a gap in semantic-conditioned population generation that can capture abstract behavioral patterns implicitly in survey data. This study proposes SemaPop, a semantic-statistical population synthesis model that integrates large language models (LLMs) with generative population modeling. SemaPop derives high-level persona representations from individual survey records and incorporates them as semantic conditioning signals for population generation, while marginal regularization is introduced to enforce alignment with target population marginals. In this study, the framework is instantiated using a Wasserstein GAN with gradient penalty (WGAN-GP) backbone, referred to as SemaPop-GAN. Extensive experiments demonstrate that SemaPop-GAN achieves improved generative performance, yielding closer alignment with target marginal and joint distributions while maintaining sample-level feasibility and diversity under semantic conditioning. Ablation studies further confirm the contribution of semantic persona conditioning and architectural design choices to balancing marginal consistency and structural realism. These results demonstrate that SemaPop-GAN enables controllable and interpretable population synthesis through effective semantic-statistical information fusion. SemaPop-GAN also provides a promising modular foundation for developing generative population projection systems that integrate individual-level behavioral semantics with population-level statistical constraints.
Problem

Research questions and friction points this paper is trying to address.

population synthesis
semantic conditioning
behavioral semantics
persona representation
individual-level simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic-conditioned generation
persona representation
population synthesis
large language models
WGAN-GP
Z
Zhenlin Qin
Department of Civil and Architectural Engineering, KTH Royal Institute of Technology, Sweden
Y
Yancheng Ling
Department of Civil and Architectural Engineering, KTH Royal Institute of Technology, Sweden
Leizhen Wang
Leizhen Wang
Monash University
Reinforcement learningLLMsIntelligent transportation systems
F
Francisco Cรขmara Pereira
Department of Technology, Management and Economics Intelligent Transportation Systems, Technical University of Denmark, Denmark
Zhenliang Ma
Zhenliang Ma
Associate Professor @ KTH Royal Institute of Technology
Applied Artificial IntelligenceIntelligent Transportation SystemsMultimodal Transportation