🤖 AI Summary
Existing world modeling approaches—such as Transformer- and Mamba-based architectures—face efficiency bottlenecks and limited structural expressivity when modeling long-horizon, high-dimensional spatiotemporal sequences. To address this, we propose FACTored State-space (FACTS), a novel recurrent architecture that uniquely integrates graph-structured memory with permutation-invariant routing, enabling order-agnostic joint spatiotemporal modeling. FACTS employs recursive factorized state representations, selective state-space propagation, and parallelizable design to substantially improve modeling efficiency. Evaluated on three benchmark tasks—multivariate time series forecasting, object-centric world modeling, and spatiotemporal graph forecasting—FACTS matches or surpasses task-specific state-of-the-art models. It establishes a unified, efficient, and scalable framework for modeling complex dynamic systems, advancing the frontier of general-purpose spatiotemporal representation learning.
📝 Abstract
World modelling is essential for understanding and predicting the dynamics of complex systems by learning both spatial and temporal dependencies. However, current frameworks, such as Transformers and selective state-space models like Mambas, exhibit limitations in efficiently encoding spatial and temporal structures, particularly in scenarios requiring long-term high-dimensional sequence modelling. To address these issues, we propose a novel recurrent framework, the extbf{FACT}ored extbf{S}tate-space ( extbf{FACTS}) model, for spatial-temporal world modelling. The FACTS framework constructs a graph-structured memory with a routing mechanism that learns permutable memory representations, ensuring invariance to input permutations while adapting through selective state-space propagation. Furthermore, FACTS supports parallel computation of high-dimensional sequences. We empirically evaluate FACTS across diverse tasks, including multivariate time series forecasting, object-centric world modelling, and spatial-temporal graph prediction, demonstrating that it consistently outperforms or matches specialised state-of-the-art models, despite its general-purpose world modelling design.