From Virtual Games to Real-World Play

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in interactive video generation: limited visual realism, temporal inconsistency, and reliance on costly real-world motion annotations. To this end, we propose RealPlay—the first neural-network-based real-world game engine. Methodologically, we introduce dual generalization mechanisms—control transfer and entity transfer—enabling direct mapping of virtual game control signals (e.g., keyboard or controller inputs) to diverse real-world entities (e.g., bicycles, pedestrians) without requiring ground-truth motion labels. We further design a block-wise iterative prediction framework jointly trained on labeled game data and unlabeled real-world videos, incorporating explicit temporal consistency constraints and a low-latency feedback loop. Experiments demonstrate that RealPlay generates interactive videos with high visual fidelity, strong control responsiveness, and robust temporal coherence, operating stably across multiple real-world scenarios. Our results validate the effectiveness and cross-domain generalizability of control transfer from synthetic to physical environments.

Technology Category

Application Category

📝 Abstract
We introduce RealPlay, a neural network-based real-world game engine that enables interactive video generation from user control signals. Unlike prior works focused on game-style visuals, RealPlay aims to produce photorealistic, temporally consistent video sequences that resemble real-world footage. It operates in an interactive loop: users observe a generated scene, issue a control command, and receive a short video chunk in response. To enable such realistic and responsive generation, we address key challenges including iterative chunk-wise prediction for low-latency feedback, temporal consistency across iterations, and accurate control response. RealPlay is trained on a combination of labeled game data and unlabeled real-world videos, without requiring real-world action annotations. Notably, we observe two forms of generalization: (1) control transfer-RealPlay effectively maps control signals from virtual to real-world scenarios; and (2) entity transfer-although training labels originate solely from a car racing game, RealPlay generalizes to control diverse real-world entities, including bicycles and pedestrians, beyond vehicles. Project page can be found: https://wenqsun.github.io/RealPlay/
Problem

Research questions and friction points this paper is trying to address.

Generates photorealistic videos from user controls
Ensures temporal consistency in interactive video sequences
Transfers virtual controls to real-world entities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network-based real-world game engine
Photorealistic interactive video generation
Generalizes control to diverse real-world entities
🔎 Similar Papers
No similar papers found.