🤖 AI Summary
In long-horizon robotic tasks, sparse rewards and continuous state-action spaces impede symbolic modeling, as existing approaches either rely solely on LLM prompting—lacking empirical grounding—or learn exclusively from demonstrations—missing high-level semantic priors. This paper proposes UniPred, the first unified framework integrating LLM-guided and perception-driven predicate invention. It features a two-tier architecture: (i) an upper tier where an LLM generates interpretable predicate hypotheses, and (ii) a lower tier that grounds these predicates via vision foundation model features and neural predicate classifiers. A closed-loop collaboration mechanism and non-STRIPS predicate evaluation are introduced to synergistically fuse semantic priors with experience-based learning. Evaluated on five simulated and one real-robot task, UniPred achieves 2–4× higher task success rates than pure LLM-based methods and 3–4× improved sample efficiency over purely data-driven approaches.
📝 Abstract
Long-horizon robotic tasks are hard due to continuous state-action spaces and sparse feedback. Symbolic world models help by decomposing tasks into discrete predicates that capture object properties and relations. Existing methods learn predicates either top-down, by prompting foundation models without data grounding, or bottom-up, from demonstrations without high-level priors. We introduce UniPred, a bilevel learning framework that unifies both. UniPred uses large language models (LLMs) to propose predicate effect distributions that supervise neural predicate learning from low-level data, while learned feedback iteratively refines the LLM hypotheses. Leveraging strong visual foundation model features, UniPred learns robust predicate classifiers in cluttered scenes. We further propose a predicate evaluation method that supports symbolic models beyond STRIPS assumptions. Across five simulated and one real-robot domains, UniPred achieves 2-4 times higher success rates than top-down methods and 3-4 times faster learning than bottom-up approaches, advancing scalable and flexible symbolic world modeling for robotics.