🤖 AI Summary
Current evaluation of AI world models relies excessively on static data fitting, neglecting agents’ capacity to rapidly construct and update internal models through interaction in novel environments. Method: We propose a dynamic evaluation framework based on “novel games”—interactive tasks featuring deep, persistent environmental novelty, grounded in cognitive science principles, with designed exploration protocols and environment evolution mechanisms; we introduce quantitative metrics to assess inductive efficiency and adaptation speed. Contribution/Results: This framework shifts world model evaluation from static representation learning to dynamic model construction, establishing the first quantifiable benchmark for rapid adaptation. It enables rigorous measurement of how quickly and effectively agents generalize and refine world models amid continuous environmental change, thereby advancing research on adaptive world models for artificial general intelligence.
📝 Abstract
Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. However, current understanding and evaluation of world models in artificial intelligence (AI) remains narrow, often focusing on static representations learned from training on a massive corpora of data, instead of the efficiency and efficacy of models in learning these representations through interaction and exploration within a novel environment. In this Perspective, we provide a view of world model induction drawing on decades of research in cognitive science on how humans learn and adapt so efficiently; we then call for a new evaluation framework for assessing adaptive world models in AI. Concretely, we propose a new benchmarking paradigm based on suites of carefully designed games with genuine, deep and continually refreshing novelty in the underlying game structures -- we refer to this kind of games as novel games. We detail key desiderata for constructing these games and propose appropriate metrics to explicitly challenge and evaluate the agent's ability for rapid world model induction. We hope that this new evaluation framework will inspire future evaluation efforts on world models in AI and provide a crucial step towards developing AI systems capable of the human-like rapid adaptation and robust generalization -- a critical component of artificial general intelligence.