Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D urban generation methods rely on monolithic diffusion models, limiting both personalization and scalable expansion. To address this, we propose a top-down hierarchical planning framework—“City–District–Grid”—that integrates large language model (LLM)-driven reasoning to enable user-guided, customizable design and continuous urban evolution. Our key innovation is a relation-guided interactive expansion mechanism, incorporating scene-graph-aware distance constraints and semantic layout optimization to ensure spatial coherence. We further introduce a multi-dimensional evaluation benchmark covering semantic fidelity, geometric accuracy, texture quality, and layout合理性, with six quantitative metrics. Leveraging a “generate–optimize–evaluate” image synthesis loop and image-to-3D reconstruction, our method jointly synthesizes hierarchical structure and local details. Experiments demonstrate state-of-the-art performance across generation quality, scalability, and user controllability.

Technology Category

Application Category

📝 Abstract
Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion model, which limits their ability to generate personalized and boundless city-scale scenes. In this paper, we present Yo'City, a novel agentic framework that enables user-customized and infinitely expandable 3D city generation by leveraging the reasoning and compositional capabilities of off-the-shelf large models. Specifically, Yo'City first conceptualize the city through a top-down planning strategy that defines a hierarchical "City-District-Grid" structure. The Global Planner determines the overall layout and potential functional districts, while the Local Designer further refines each district with detailed grid-level descriptions. Subsequently, the grid-level 3D generation is achieved through a "produce-refine-evaluate" isometric image synthesis loop, followed by image-to-3D generation. To simulate continuous city evolution, Yo'City further introduces a user-interactive, relationship-guided expansion mechanism, which performs scene graph-based distance- and semantics-aware layout optimization, ensuring spatially coherent city growth. To comprehensively evaluate our method, we construct a diverse benchmark dataset and design six multi-dimensional metrics that assess generation quality from the perspectives of semantics, geometry, texture, and layout. Extensive experiments demonstrate that Yo'City consistently outperforms existing state-of-the-art methods across all evaluation aspects.
Problem

Research questions and friction points this paper is trying to address.

Generating personalized and boundless 3D city scenes using single diffusion models
Creating user-customized infinite city expansion with coherent spatial relationships
Evaluating 3D city generation quality across semantics, geometry, and layout
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical City-District-Grid planning structure
Produce-refine-evaluate loop for 3D generation
Scene graph-based expansion for coherent growth
🔎 Similar Papers
No similar papers found.