๐ค AI Summary
This work addresses the lack of a systematic theoretical foundation in existing world models for simulating human cognition, particularly with respect to motivation and metacognition. Drawing upon cognitive architecture theory, it proposes a unified โcognitive world modelโ framework that integrates memory, perception, language, reasoning, imagination, motivation, and metacognition for the first time. The study advances this integration by combining active inference with global workspace theory to outline a novel pathway toward human-like intelligence. Furthermore, it introduces an innovative taxonomy that categorizes world models into video-based, embodied, and cognitive types, thereby clarifying current research gaps. This framework not only establishes a coherent theoretical basis but also provides a clear roadmap for the development of artificial systems exhibiting human-like cognitive capabilities.
๐ Abstract
This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles in Cognitive Architecture Theory (CAT). We present a conceptual unified framework for world models that fully incorporates all the cognitive functions associated with CAT (i.e. memory, perception, language, reasoning, imagining, motivation, and meta-cognition) and identify gaps in the research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and meta-cognition remain drastically under-researched, and we propose concrete directions informed by active inference and global workspace theory to address them. We further introduce Epistemic World Models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied across video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.