🤖 AI Summary
Large language models (LLMs) lack causal intuition about physical dynamics, severely limiting their zero-shot physical reasoning capabilities in real-world scenarios. To address this, we propose the Causal World Model Induction (CWMI) framework, which introduces a causal intervention loss to guide multimodal learning toward authentic causal mechanisms rather than spurious statistical correlations. Furthermore, we design a Causal Physics Module (CPM), trained end-to-end using counterfactual intervention predictions as supervisory signals. Evaluated on PIQA and our newly constructed PhysiCa-Bench—a rigorous physical causality benchmark—CWMI significantly outperforms existing state-of-the-art models. It is the first approach to enable LLMs to reliably model and generalize physical causal relationships under zero-shot conditions. By grounding physical reasoning in causal semantics, CWMI establishes a scalable architectural foundation for embodied intelligence and causal AI.
📝 Abstract
Large Language Models (LLMs), despite their advanced linguistic capabilities, fundamentally lack an intuitive understanding of physical dynamics, which limits their effectiveness in real-world scenarios that require causal reasoning. In this paper, we introduce Causal World Model Induction (CWMI), a novel framework designed to embed an explicit model of causal physics within an LLM. Our approach incorporates a dedicated Causal Physics Module (CPM) and a new training objective called Causal Intervention Loss, encouraging the model to learn cause-and-effect relationships from multimodal data. By training the model to predict the outcomes of hypothetical interventions instead of merely capturing statistical correlations, CWMI develops a robust internal representation of physical laws. Experimental results show that CWMI significantly outperforms state-of-the-art LLMs on zero-shot physical reasoning tasks, including the PIQA benchmark and our newly proposed PhysiCa-Bench dataset. These findings demonstrate that inducing a causal world model is a critical step toward more reliable and generalizable AI systems.