Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Traditional rendering approaches struggle with generating dense, dynamic urban scenes due to complex asset requirements, high computational costs, poor scalability, and insufficient realism. To address these limitations, this work proposes the C2R framework, which first employs a coarse 3D simulation to control scene layout, camera motion, and pedestrian trajectories, then leverages a text-guided neural renderer to synthesize photorealistic appearance, lighting, and fine-grained dynamics. Innovatively, C2R adopts a two-stage hybrid training strategy combining CG and real-world videos, enabling implicit cross-domain spatiotemporal feature sharing without paired data, thereby balancing controllability and visual fidelity. Requiring only minimal 3D input, the system generates temporally consistent, high-fidelity, and controllable urban crowd videos, and is compatible with mainstream CG and game engines.

Technology Category

Application Category

📝 Abstract

Traditional rendering pipelines rely on complex assets, accurate materials and lighting, and substantial computational resources to produce realistic imagery, yet they still face challenges in scalability and realism for populated dynamic scenes. We present C2R (Coarse-to-Real), a generative rendering framework that synthesizes real-style urban crowd videos from coarse 3D simulations. Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories, while a learned neural renderer generates realistic appearance, lighting, and fine-scale dynamics guided by text prompts. To overcome the lack of paired training data between coarse simulations and real videos, we adopt a two-phase mixed CG-real training strategy that learns a strong generative prior from large-scale real footage and introduces controllability through shared implicit spatio-temporal features across domains. The resulting system supports coarse-to-fine control, generalizes across diverse CG and game inputs, and produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input. We will release the model and project webpage at https://gonzalognogales.github.io/coarse2real/.

Problem

Research questions and friction points this paper is trying to address.

generative rendering

populated dynamic scenes

realistic urban videos

coarse-to-real synthesis

scalability in rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative rendering

coarse-to-real synthesis

neural rendering