Learning Bilevel Policies over Symbolic World Models for Long-Horizon Planning

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of enabling embodied agents to simultaneously perform high-level planning and low-level fine-grained control in long-horizon tasks by proposing BISON, a novel two-tier policy framework that effectively integrates symbolic planning with neural imitation learning. The lower-level policy learns precise motor control from continuous action demonstrations, while the upper-level policy generates interpretable, long-horizon plans using a symbolic world model enhanced with inductive generalization. Experiments demonstrate that BISON significantly outperforms vision-language-action (VLA) and end-to-end baselines on an extended MetaWorld benchmark, exhibiting strong generalization to tasks involving more objects and longer sequences. Notably, the high-level planner alone can solve problems involving up to 10,000 relevant objects within one minute, showcasing exceptional efficiency, scalability, and interpretability.

📝 Abstract

We tackle the challenge of building embodied AI agents that can reliably solve long-horizon planning problems. Imitation learning from demonstrations has shown itself to be effective in training robots to solve a diversity of complex tasks requiring fine motor control and manipulation over low-level (LL), continuous environments. Yet, it remains a difficult endeavour to generate long-horizon plans from imitation learning alone. In contrast, high-level (HL), symbolic abstractions facilitate efficient and interpretable long-horizon planning. We propose to combine the strengths of LL imitation learning for manipulation and control, and HL symbolic abstractions for long-horizon planning. We realise this idea via \emph{bilevel policies} of the form $(\pi^{\mathrm{hl}}, \pi^{\mathrm{ll}})$, consisting of a neural policy $\pi^{\mathrm{ll}}$ learned from LL demonstrations, and an HL symbolic policy $\pi^{\mathrm{hl}}$ that is constructed from symbolic abstractions of the LL demonstrations combined with inductive generalisation. We implement these ideas in the BISON system. Experiments on extended MetaWorld benchmarks demonstrate that BISON generalises to long horizons and problems with greater numbers of objects than those solved by VLA and end-to-end methods, and is more time and memory efficient in training and inference. Notably, when ignoring LL execution, BISON's HL policies can solve HL problems with 10,000 relevant objects in under a minute. Project page: https://dillonzchen.github.io/bison

Problem

Research questions and friction points this paper is trying to address.

long-horizon planning

embodied AI

symbolic world models

bilevel policies

imitation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

bilevel policies

symbolic abstraction

long-horizon planning