π€ AI Summary
Directly employing large language models (LLMs) as complete world models for planning is inefficient and lacks precision. This work proposes treating the LLM as a task-agnostic, language-conditioned, affordance-driven partial world model that captures only the subset of states and actions relevant to the userβs intent. By integrating affordance theory, LLMs, and formal modeling techniques, the approach constructs an intent-oriented, compact world model that substantially reduces the search branching factor. Evaluated on tabletop robotic tasks, the method achieves more efficient planning and higher cumulative rewards compared to full world models, while effectively supporting multi-task scenarios.
π Abstract
Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast amounts of internet scale data, using them directly in a search procedure is inefficient and inaccurate. Conversely, partial models focus on making high quality predictions for a subset of state and actions: those linked through affordances that achieve user intents~\citep{khetarpal2020can}. Can we posit large models as partial world models? We provide a formal answer to this question, proving that agents achieving task-agnostic, language-conditioned intents necessarily possess predictive partial-world models informed by affordances. In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency. Empirical evaluations in tabletop robotics tasks demonstrate that our affordance-aware partial models reduce the search branching factor and achieve higher rewards compared to full world models.