🤖 AI Summary
Long-horizon robotic planning must account for exogenous environmental dynamics—such as water heating or domino effects—that evolve independently of the agent’s actions, yet existing world models often neglect such non-autonomous changes. This work introduces the first learnable abstract world model that jointly represents symbolic state abstractions and causally models both endogenous agent behaviors and exogenous environmental mechanisms. Methodologically, it integrates large language model (LLM)-guided structural priors with variational Bayesian inference to efficiently learn stochastic temporal causal effects from few-shot observations. Its key contribution is the first explicit incorporation of exogenous processes into a learnable symbolic causal model, enabling unified representation and reasoning over both endogenous and exogenous dynamics. Evaluated on five simulated robotic tasks, the model achieves significantly higher long-horizon planning success rates and superior cross-task generalization compared to multiple baselines.
📝 Abstract
Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic causal-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.