WORLDMEM: Long-term Consistent World Simulation with Memory

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

World simulation suffers from deteriorating long-term spatiotemporal consistency—particularly in 3D scene evolution—due to the limited context window of Transformer architectures. To address this, we propose the first world simulation framework enabling long-term consistency. Our method introduces: (1) a stateful memory bank jointly indexed by pose and timestamp; (2) a state-aware memory attention mechanism that enables precise cross-view and temporal scene reconstruction; and (3) a joint spatiotemporal representation coupled with an end-to-end differentiable reconstruction module. Evaluated on both synthetic and real-world scenes, our approach reduces reconstruction error by 42% under large viewpoint shifts and extended time horizons, significantly improving long-term consistency metrics. Moreover, it supports continual perception and physics-aware interaction.

Technology Category

Application Category

📝 Abstract

World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions. However, the limited temporal context window often leads to failures in maintaining long-term consistency, particularly in preserving 3D spatial consistency. In this work, we present WorldMem, a framework that enhances scene generation with a memory bank consisting of memory units that store memory frames and states (e.g., poses and timestamps). By employing a memory attention mechanism that effectively extracts relevant information from these memory frames based on their states, our method is capable of accurately reconstructing previously observed scenes, even under significant viewpoint or temporal gaps. Furthermore, by incorporating timestamps into the states, our framework not only models a static world but also captures its dynamic evolution over time, enabling both perception and interaction within the simulated world. Extensive experiments in both virtual and real scenarios validate the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Maintains long-term 3D spatial consistency in world simulation

Reconstructs scenes accurately across viewpoint or time gaps

Models dynamic world evolution for perception and interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory bank stores frames and states

Memory attention extracts relevant information

Timestamps model dynamic world evolution

🔎 Similar Papers

No similar papers found.