🤖 AI Summary
This work addresses the limitations of current world models, which often prioritize visual fidelity at the expense of physical plausibility and causal structure, thereby hindering their capacity for intervention, long-horizon prediction, and safety-critical decision-making. To overcome these shortcomings, the paper proposes a novel paradigm grounded in physical realism and explicit causal modeling, reframing world models as actionable simulators. The approach integrates a structured 4D interface, constraint-aware dynamics, and counterfactual reasoning mechanisms to enable precise intervention planning. Furthermore, it introduces a closed-loop evaluation framework to rigorously assess model performance. Evaluated in high-stakes domains such as medical decision-making, the method demonstrates substantial improvements in long-term robustness, intervention efficacy, and causal consistency.
📝 Abstract
A world model is an AI system that simulates how an environment evolves under actions, enabling planning through imagined futures rather than reactive perception. Current world models, however, suffer from visual conflation: the mistaken assumption that high-fidelity video generation implies an understanding of physical and causal dynamics. We show that while modern models excel at predicting pixels, they frequently violate invariant constraints, fail under intervention, and break down in safety-critical decision-making. This survey argues that visual realism is an unreliable proxy for world understanding. Instead, effective world models must encode causal structure, respect domain-specific constraints, and remain stable over long horizons. We propose a reframing of world models as actionable simulators rather than visual engines, emphasizing structured 4D interfaces, constraint-aware dynamics, and closed-loop evaluation. Using medical decision-making as an epistemic stress test, where trial-and-error is impossible and errors are irreversible, we demonstrate that a world model's value is determined not by how realistic its rollouts appear, but by its ability to support counterfactual reasoning, intervention planning, and robust long-horizon foresight.