🤖 AI Summary
Existing neural world models suffer from limited adaptability to environmental dynamics and poor interpretability. Method: This paper proposes a neuro-symbolic world modeling approach for game videos, centered on a finite-state automaton (FSA) extraction framework that integrates self-supervised feature learning with program synthesis to automatically infer structured environmental dynamics from raw video under low-data conditions. It further introduces Retro Coder—a domain-specific language—enabling human-readable, editable, and formally verifiable programmatic representations of dynamic behaviors. Contribution/Results: Experiments demonstrate that the proposed model significantly outperforms purely neural baselines in prediction accuracy, cross-environment generalization, and human interpretability. By unifying neural perception with symbolic reasoning, it establishes a novel paradigm for explainable AI-driven world modeling.
📝 Abstract
World models are defined as a compressed spatial and temporal learned representation of an environment. The learned representation is typically a neural network, making transfer of the learned environment dynamics and explainability a challenge. In this paper, we propose an approach, Finite Automata Extraction (FAE), that learns a neuro-symbolic world model from gameplay video represented as programs in a novel domain-specific language (DSL): Retro Coder. Compared to prior world model approaches, FAE learns a more precise model of the environment and more general code than prior DSL-based approaches.