Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional rendering approaches struggle with generating dense, dynamic urban scenes due to complex asset requirements, high computational costs, poor scalability, and insufficient realism. To address these limitations, this work proposes the C2R framework, which first employs a coarse 3D simulation to control scene layout, camera motion, and pedestrian trajectories, then leverages a text-guided neural renderer to synthesize photorealistic appearance, lighting, and fine-grained dynamics. Innovatively, C2R adopts a two-stage hybrid training strategy combining CG and real-world videos, enabling implicit cross-domain spatiotemporal feature sharing without paired data, thereby balancing controllability and visual fidelity. Requiring only minimal 3D input, the system generates temporally consistent, high-fidelity, and controllable urban crowd videos, and is compatible with mainstream CG and game engines.

Technology Category

Application Category

📝 Abstract
Traditional rendering pipelines rely on complex assets, accurate materials and lighting, and substantial computational resources to produce realistic imagery, yet they still face challenges in scalability and realism for populated dynamic scenes. We present C2R (Coarse-to-Real), a generative rendering framework that synthesizes real-style urban crowd videos from coarse 3D simulations. Our approach uses coarse 3D renderings to explicitly control scene layout, camera motion, and human trajectories, while a learned neural renderer generates realistic appearance, lighting, and fine-scale dynamics guided by text prompts. To overcome the lack of paired training data between coarse simulations and real videos, we adopt a two-phase mixed CG-real training strategy that learns a strong generative prior from large-scale real footage and introduces controllability through shared implicit spatio-temporal features across domains. The resulting system supports coarse-to-fine control, generalizes across diverse CG and game inputs, and produces temporally consistent, controllable, and realistic urban scene videos from minimal 3D input. We will release the model and project webpage at https://gonzalognogales.github.io/coarse2real/.
Problem

Research questions and friction points this paper is trying to address.

generative rendering
populated dynamic scenes
realistic urban videos
coarse-to-real synthesis
scalability in rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative rendering
coarse-to-real synthesis
neural rendering
text-guided video generation
cross-domain training
🔎 Similar Papers
No similar papers found.
G
Gonzalo Gomez-Nogales
Universidad Rey Juan Carlos, Spain
Yicong Hong
Yicong Hong
Adobe Research
Video GenerationWorld ModelsEmbodied AI
C
Chongjian Ge
Adobe Research, USA
M
Marc Comino-Trinidad
Universidad Rey Juan Carlos, Spain
Dan Casas
Dan Casas
Senior Applied Scientist, Amazon
Computer VisionComputer GraphicsMachine Learning3D reconstruction
Y
Yi Zhou
Roblox, USA