Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design

πŸ“… 2026-01-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the β€œmemory wall” bottleneck that limits existing generative game engines from supporting high-resolution real-time neural simulation. The authors propose a hardware-algorithm co-design approach that decouples the compute-intensive world model from the memory-intensive decoder and deploys the system on a programmable AI accelerator cluster. Key innovations include asymmetric resource allocation under a heterogeneous architecture, memory-centric operator fusion, and a manifold-aware latent space extrapolation mechanism. Evaluated on representative benchmarks, the system achieves 26.4 FPS at 720Γ—480 resolution for a 3D racing game and 48.3 FPS for a 2D platformer, delivering a 50Γ— increase in pixel throughput and an end-to-end latency as low as 2.7 ms.

Technology Category

Application Category

πŸ“ Abstract
Real-time generative game engines represent a paradigm shift in interactive simulation, promising to replace traditional graphics pipelines with neural world models. However, existing approaches are fundamentally constrained by the ``Memory Wall,''restricting practical deployments to low resolutions (e.g., $64 \times 64$). This paper bridges the gap between generative models and high-resolution neural simulations by introducing a scalable \textit{Hardware-Algorithm Co-Design} framework. We identify that high-resolution generation suffers from a critical resource mismatch: the World Model is compute-bound while the Decoder is memory-bound. To address this, we propose a heterogeneous architecture that intelligently decouples these components across a cluster of AI accelerators. Our system features three core innovations: (1) an asymmetric resource allocation strategy that optimizes throughput under sequence parallelism constraints; (2) a memory-centric operator fusion scheme that minimizes off-chip bandwidth usage; and (3) a manifold-aware latent extrapolation mechanism that exploits temporal redundancy to mask latency. We validate our approach on a cluster of programmable AI accelerators, enabling real-time generation at $720 \times 480$ resolution -- a $50\times$ increase in pixel throughput over prior baselines. Evaluated on both continuous 3D racing and discrete 2D platformer benchmarks, our system delivers fluid 26.4 FPS and 48.3 FPS respectively, with an amortized effective latency of 2.7 ms. This work demonstrates that resolving the ``Memory Wall''via architectural co-design is not merely an optimization, but a prerequisite for enabling high-fidelity, responsive neural gameplay.
Problem

Research questions and friction points this paper is trying to address.

Memory Wall
Generative Game Engine
High-resolution Neural Simulation
Real-time Generation
Neural World Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-Algorithm Co-Design
Memory Wall
Generative Game Engine
Latent Extrapolation
Operator Fusion
πŸ”Ž Similar Papers
No similar papers found.
W
Wei Zeng
School of Microelectronics, Southern University of Science and Technology (SUSTech), Shenzhen, China; Zhongguancun Academy, Beijing, China
X
Xuchen Li
Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China; Zhongguancun Academy, Beijing, China
R
Ruili Feng
University of Waterloo, Waterloo, Canada
Z
Zhen Liu
School of Marxism, Tsinghua University, Beijing, China; Zhongguancun Academy, Beijing, China
Fengwei An
Fengwei An
Southern University of Science and Technology
Jian Zhao
Jian Zhao
Zhongguancun Institute of Artificial Intelligence
Reinforcement LearningMulti-Agent System