Imagine a City: CityGenAgent for Procedural 3D City Generation

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 3D urban generation face limitations in creating high-fidelity assets, ensuring controllability, and supporting interactive editing. This work proposes CityGenAgent, a natural language–driven, hierarchical procedural generation framework that decomposes city synthesis into interpretable block-level (BlockGen) and building-level (BuildingGen) programs. By integrating supervised fine-tuning with reinforcement learning—augmented by spatial alignment and visual consistency reward mechanisms—the approach enforces structural plausibility and semantic fidelity in the generated outputs. CityGenAgent significantly outperforms current state-of-the-art techniques in semantic controllability, visual realism, and interactive editability, enabling scalable and high-fidelity 3D urban environment generation.

Technology Category

Application Category

📝 Abstract
The automated generation of interactive 3D cities is a critical challenge with broad applications in autonomous driving, virtual reality, and embodied intelligence. While recent advances in generative models and procedural techniques have improved the realism of city generation, existing methods often struggle with high-fidelity asset creation, controllability, and manipulation. In this work, we introduce CityGenAgent, a natural language-driven framework for hierarchical procedural generation of high-quality 3D cities. Our approach decomposes city generation into two interpretable components, Block Program and Building Program. To ensure structural correctness and semantic alignment, we adopt a two-stage learning strategy: (1) Supervised Fine-Tuning (SFT). We train BlockGen and BuildingGen to generate valid programs that adhere to schema constraints, including non-self-intersecting polygons and complete fields; (2) Reinforcement Learning (RL). We design Spatial Alignment Reward to enhance spatial reasoning ability and Visual Consistency Reward to bridge the gap between textual descriptions and the visual modality. Benefiting from the programs and the models'generalization, CityGenAgent supports natural language editing and manipulation. Comprehensive evaluations demonstrate superior semantic alignment, visual quality, and controllability compared to existing methods, establishing a robust foundation for scalable 3D city generation.
Problem

Research questions and friction points this paper is trying to address.

procedural generation
3D city
controllability
high-fidelity assets
semantic alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

procedural generation
natural language-driven
hierarchical modeling
reinforcement learning
3D city generation
🔎 Similar Papers
No similar papers found.