Synthesizing world models for bilevel planning

๐Ÿ“… 2025-03-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing theory-based reinforcement learning (TBRL) approaches suffer from limited expressivity of formal languages and non-scalable planning algorithms, hindering simultaneous achievement of sample efficiency and generalization. This paper introduces TheoryCoder: a novel framework that (1) constructs a hierarchical causal world model enabling dual-level decision-makingโ€”low-level action execution and high-level semantic planning; (2) leverages large language models to synthesize environment-specific Python transition programs and define generalizable high-level action abstractions (e.g., โ€œmove toโ€); and (3) supports automatic program grounding and execution. TheoryCoder is the first to integrate cognitively inspired causal theory representation, hierarchical program synthesis, and dual-level planning in a unified architecture. Empirical evaluation on diverse grid-world tasks demonstrates substantial improvements over end-to-end policy methods. Ablation studies confirm that hierarchical abstraction critically enhances both generalization capability and planning efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern reinforcement learning (RL) systems have demonstrated remarkable capabilities in complex environments, such as video games. However, they still fall short of achieving human-like sample efficiency and adaptability when learning new domains. Theory-based reinforcement learning (TBRL) is an algorithmic framework specifically designed to address this gap. Modeled on cognitive theories, TBRL leverages structured, causal world models -"theories"- as forward simulators for use in planning, generalization and exploration. Although current TBRL systems provide compelling explanations of how humans learn to play video games, they face several technical limitations: their theory languages are restrictive, and their planning algorithms are not scalable. To address these challenges, we introduce TheoryCoder, an instantiation of TBRL that exploits hierarchical representations of theories and efficient program synthesis methods for more powerful learning and planning. TheoryCoder equips agents with general-purpose abstractions (e.g.,"move to"), which are then grounded in a particular environment by learning a low-level transition model (a Python program synthesized from observations by a large language model). A bilevel planning algorithm can exploit this hierarchical structure to solve large domains. We demonstrate that this approach can be successfully applied to diverse and challenging grid-world games, where approaches based on directly synthesizing a policy perform poorly. Ablation studies demonstrate the benefits of using hierarchical abstractions.
Problem

Research questions and friction points this paper is trying to address.

Enhance sample efficiency in reinforcement learning
Overcome restrictive theory languages in TBRL
Improve scalability of bilevel planning algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical representations for scalable planning
Program synthesis using large language models
Bilevel planning algorithm for complex domains
๐Ÿ”Ž Similar Papers
No similar papers found.