SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments

πŸ“… 2025-11-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current world models lack a unified, controllable dynamical environment for systematic evaluation of their understanding of underlying physical rules. To address this, we propose SmallWorldβ€”a novel benchmark specifically designed for controlled dynamical assessment of world models. It encompasses six distinct classes of dynamical systems and operates under fully observable conditions without requiring hand-crafted reward signals. The framework supports forward modeling and multi-step prediction analysis across mainstream architectures, including recurrent state-space models, Transformers, diffusion models, and neural ODEs. Our experiments provide the first quantitative characterization of significant differences across models in structural awareness and in the degradation patterns of long-horizon rollouts. These findings establish a reproducible empirical benchmark for representation learning and dynamical modeling, while also identifying concrete directions for architectural improvement.

Technology Category

Application Category

πŸ“ Abstract
Current world models lack a unified and controlled setting for systematic evaluation, making it difficult to assess whether they truly capture the underlying rules that govern environment dynamics. In this work, we address this open challenge by introducing the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics without relying on handcrafted reward signals. Using this benchmark, we conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE, examining their behavior across six distinct domains. The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts, highlighting both the strengths and limitations of current modeling paradigms and offering insights into future improvement directions in representation learning and dynamics modeling.
Problem

Research questions and friction points this paper is trying to address.

Evaluating world models' dynamics understanding lacks unified controlled settings
Assessing model capability through isolated environments without reward signals
Testing how models capture environment structure and prediction deterioration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SmallWorld Benchmark for controlled evaluation
Tests diverse architectures in isolated dynamics environments
Analyzes prediction degradation over extended model rollouts
πŸ”Ž Similar Papers
No similar papers found.
X
Xinyi Li
University of California, Davis
Z
Zaishuo Xia
University of California, Davis
W
Weyl Lu
University of California, Davis
C
Chenjie Hao
University of California, Davis
Yubei Chen
Yubei Chen
UC Davis | Aizip.ai
Unsupervised LearningWorld ModelsScience 4 AI