Generative World Renderer

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic datasets suffer from insufficient photorealism and temporal coherence, limiting the applicability of generative inverse and forward rendering in real-world scenarios. This work proposes a large-scale dynamic video dataset derived from AAA-grade games, employing a dual-screen capture method to simultaneously acquire RGB frames and five-channel G-buffers, thereby disentangling geometry from material properties. Building upon this dataset, we introduce a ground-truth-free evaluation protocol leveraging vision-language models (VLMs) and develop a text-prompt-driven framework for G-buffer style editing. Experiments demonstrate that our approach significantly outperforms existing methods in cross-dataset generalization and controllable generation, with VLM-based evaluations showing strong alignment with human judgments. We also release an open-source toolkit enabling real-time, high-fidelity style transfer for game rendering.
📝 Abstract
Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex AAA games. Using a novel dual-screen stitched capture method, we extracted 4M continuous frames (720p/30 FPS) of synchronized RGB and five G-buffer channels across diverse scenes, visual effects, and environments, including adverse weather and motion-blur variants. This dataset uniquely advances bidirectional rendering: enabling robust in-the-wild geometry and material decomposition, and facilitating high-fidelity G-buffer-guided video generation. Furthermore, to evaluate the real-world performance of inverse rendering without ground truth, we propose a novel VLM-based assessment protocol measuring semantic, spatial, and temporal consistency. Experiments demonstrate that inverse renderers fine-tuned on our data achieve superior cross-dataset generalization and controllable generation, while our VLM evaluation strongly correlates with human judgment. Combined with our toolkit, our forward renderer enables users to edit styles of AAA games from G-buffers using text prompts.
Problem

Research questions and friction points this paper is trying to address.

inverse rendering
forward rendering
domain gap
temporal coherence
realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Rendering
G-buffer
Inverse Rendering
VLM-based Evaluation
Synthetic Dataset
🔎 Similar Papers
No similar papers found.