Unifying Deep Predicate Invention with Pre-trained Foundation Models

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In long-horizon robotic tasks, sparse rewards and continuous state-action spaces impede symbolic modeling, as existing approaches either rely solely on LLM prompting—lacking empirical grounding—or learn exclusively from demonstrations—missing high-level semantic priors. This paper proposes UniPred, the first unified framework integrating LLM-guided and perception-driven predicate invention. It features a two-tier architecture: (i) an upper tier where an LLM generates interpretable predicate hypotheses, and (ii) a lower tier that grounds these predicates via vision foundation model features and neural predicate classifiers. A closed-loop collaboration mechanism and non-STRIPS predicate evaluation are introduced to synergistically fuse semantic priors with experience-based learning. Evaluated on five simulated and one real-robot task, UniPred achieves 2–4× higher task success rates than pure LLM-based methods and 3–4× improved sample efficiency over purely data-driven approaches.

Technology Category

Application Category

📝 Abstract
Long-horizon robotic tasks are hard due to continuous state-action spaces and sparse feedback. Symbolic world models help by decomposing tasks into discrete predicates that capture object properties and relations. Existing methods learn predicates either top-down, by prompting foundation models without data grounding, or bottom-up, from demonstrations without high-level priors. We introduce UniPred, a bilevel learning framework that unifies both. UniPred uses large language models (LLMs) to propose predicate effect distributions that supervise neural predicate learning from low-level data, while learned feedback iteratively refines the LLM hypotheses. Leveraging strong visual foundation model features, UniPred learns robust predicate classifiers in cluttered scenes. We further propose a predicate evaluation method that supports symbolic models beyond STRIPS assumptions. Across five simulated and one real-robot domains, UniPred achieves 2-4 times higher success rates than top-down methods and 3-4 times faster learning than bottom-up approaches, advancing scalable and flexible symbolic world modeling for robotics.
Problem

Research questions and friction points this paper is trying to address.

Unifies top-down and bottom-up predicate learning for robotics
Learns robust symbolic world models from low-level data and LLMs
Enables scalable symbolic reasoning beyond STRIPS assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified bilevel framework combining LLM proposals with neural learning
Leverages visual foundation models for robust predicate classifiers
Introduces evaluation method extending symbolic models beyond STRIPS
🔎 Similar Papers
No similar papers found.
Q
Qianwei Wang
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
B
Bowen Li
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Z
Zhanpeng Luo
Department of Computer Science, University of Pittsburgh
Y
Yifan Xu
Computer Science and Engineering Division, University of Michigan
A
Alexander Gray
Centaur AI Institute
Tom Silver
Tom Silver
Assistant Professor at Princeton
PlanningLearningRobotics
Sebastian Scherer
Sebastian Scherer
Associate Research Professor, Carnegie Mellon University
RoboticsUASobstacle avoidanceperceptionplanning
Katia Sycara
Katia Sycara
Professor School of Computer Science, Carnegie Mellon University
Artificial IntelligenceMulti-Robot SystemsHuman Robot InteractionMulti-Agent SystemsSemantic Web
Y
Yaqi Xie
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA