🤖 AI Summary
Can general-purpose agents achieve flexible, multi-step, goal-directed behavior solely via model-free learning? This paper provides the first formal proof that world models are necessary for universal goal-directed generalization. We introduce a novel paradigm for *inverse extraction* of world models from trained policies—enabling high-fidelity model recovery without explicit model-based training. Our framework establishes quantitative relationships among goal complexity, policy performance, and world model accuracy, integrating tools from control theory, causal representation learning, and policy interpretability analysis. Key contributions include: (1) a necessity theorem proving that world models are indispensable for universal goal-directed generalization; (2) design principles for safe and controllable agents grounded in world model fidelity; (3) a characterization framework for environmental capability boundaries; and (4) a high-accuracy, model-free world model extraction algorithm.
📝 Abstract
Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent's policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.