Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the “memory wall” bottleneck that limits existing generative game engines from supporting high-resolution real-time neural simulation. The authors propose a hardware-algorithm co-design approach that decouples the compute-intensive world model from the memory-intensive decoder and deploys the system on a programmable AI accelerator cluster. Key innovations include asymmetric resource allocation under a heterogeneous architecture, memory-centric operator fusion, and a manifold-aware latent space extrapolation mechanism. Evaluated on representative benchmarks, the system achieves 26.4 FPS at 720×480 resolution for a 3D racing game and 48.3 FPS for a 2D platformer, delivering a 50× increase in pixel throughput and an end-to-end latency as low as 2.7 ms.

Technology Category

Application Category

📝 Abstract

Real-time generative game engines represent a paradigm shift in interactive simulation, promising to replace traditional graphics pipelines with neural world models. However, existing approaches are fundamentally constrained by the ``Memory Wall,''restricting practical deployments to low resolutions (e.g., $64 \times 64$). This paper bridges the gap between generative models and high-resolution neural simulations by introducing a scalable \textit{Hardware-Algorithm Co-Design} framework. We identify that high-resolution generation suffers from a critical resource mismatch: the World Model is compute-bound while the Decoder is memory-bound. To address this, we propose a heterogeneous architecture that intelligently decouples these components across a cluster of AI accelerators. Our system features three core innovations: (1) an asymmetric resource allocation strategy that optimizes throughput under sequence parallelism constraints; (2) a memory-centric operator fusion scheme that minimizes off-chip bandwidth usage; and (3) a manifold-aware latent extrapolation mechanism that exploits temporal redundancy to mask latency. We validate our approach on a cluster of programmable AI accelerators, enabling real-time generation at $720 \times 480$ resolution -- a $50\times$ increase in pixel throughput over prior baselines. Evaluated on both continuous 3D racing and discrete 2D platformer benchmarks, our system delivers fluid 26.4 FPS and 48.3 FPS respectively, with an amortized effective latency of 2.7 ms. This work demonstrates that resolving the ``Memory Wall''via architectural co-design is not merely an optimization, but a prerequisite for enabling high-fidelity, responsive neural gameplay.

Problem

Research questions and friction points this paper is trying to address.

Memory Wall

Generative Game Engine

High-resolution Neural Simulation

Real-time Generation

Neural World Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-Algorithm Co-Design

Memory Wall

Generative Game Engine