Diffusion Models Are Real-Time Game Engines

📅 2024-08-27

🏛️ arXiv.org

📈 Citations: 51

✨ Influential: 6

career value

186K/year

🤖 AI Summary

This work addresses the limitations of conventional game engines in generating high-fidelity long-horizon trajectories and simulating dynamic environments in real time. To this end, we introduce GameNGen—the first end-to-end neural, interactive game engine. Leveraging DOOM gameplay data, we pioneer the use of a conditional diffusion model as a differentiable, autoregressive engine core, enabling real-time 20 FPS frame prediction and dynamic environment generation. We propose a novel conditional augmentation mechanism and decoder fine-tuning strategy, significantly improving stability and visual-semantic fidelity over minute-long sequences (PSNR: 29.4). GameNGen achieves real-time inference on a single TPU, with human evaluators distinguishing its outputs from ground truth at only ≈51% accuracy—comparable to JPEG-compressed video. This work establishes a new paradigm for neural rendering and embodied intelligence.

Technology Category

Application Category

📝 Abstract

We present GameNGen, the first game engine powered entirely by a neural model that also enables real-time interaction with a complex environment over long trajectories at high quality. When trained on the classic game DOOM, GameNGen extracts gameplay and uses it to generate a playable environment that can interactively simulate new trajectories. GameNGen runs at 20 frames per second on a single TPU and remains stable over extended multi-minute play sessions. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation, even after 5 minutes of auto-regressive generation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations help ensure stable auto-regressive generation over long trajectories, and decoder fine-tuning improves the fidelity of visual details and text.

Problem

Research questions and friction points this paper is trying to address.

Creating real-time neural game engine

Generating interactive playable environments

Ensuring stable long-trajectory simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural model powers real-time game engine

Diffusion model generates next game frame

Conditioning ensures stable long trajectory generation

🔎 Similar Papers

No similar papers found.