Beyond Pixel Histories: World Models with Persistent 3D State

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the limitations of existing interactive world models, which suffer from constrained spatial memory and insufficient 3D consistency due to the absence of explicit 3D environmental representations, thereby hindering long-term stable generation and downstream agent tasks. To overcome this, the authors propose PERSIST, a novel paradigm that integrates a persistent 3D state into world models for the first time. By combining implicit 3D scene modeling, differentiable rendering, and a dynamic state evolution mechanism, PERSIST jointly leverages spatial memory and user actions to generate videos. The method enables diverse yet geometrically consistent 3D environments from a single image and supports fine-grained, 3D-aware editing and control. Experiments demonstrate that PERSIST significantly outperforms current approaches in spatial memory, 3D consistency, and long-horizon stability, with both quantitative metrics and user studies confirming its superior generation quality.

Technology Category

Application Category

📝 Abstract

Interactive world models continually generate video by responding to a user's actions, enabling open-ended generation capabilities. However, existing models typically lack a 3D representation of the environment, meaning 3D consistency must be implicitly learned from data, and spatial memory is restricted to limited temporal context windows. This results in an unrealistic user experience and presents significant obstacles to down-stream tasks such as training agents. To address this, we present PERSIST, a new paradigm of world model which simulates the evolution of a latent 3D scene: environment, camera, and renderer. This allows us to synthesize new frames with persistent spatial memory and consistent geometry. Both quantitative metrics and a qualitative user study show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods, enabling coherent, evolving 3D worlds. We further demonstrate novel capabilities, including synthesising diverse 3D environments from a single image, as well as enabling fine-grained, geometry-aware control over generated experiences by supporting environment editing and specification directly in 3D space. Project page: https://francelico.github.io/persist.github.io

Problem

Research questions and friction points this paper is trying to address.

world models

3D consistency

spatial memory

interactive generation

persistent state

Innovation

Methods, ideas, or system contributions that make the work stand out.

world models

persistent 3D state

spatial memory