Verifiably Following Complex Robot Instructions with Foundation Models

📅 2024-02-18

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to reliably execute open-ended, complex natural-language instructions in real-world environments. We propose LIMP, a verifiable execution framework that operates without pre-built semantic maps and supports flexible constraints and arbitrary landmark references. Its core innovation is the first symbolic instruction representation that explicitly models alignment between user intent and robot behavior, integrating foundation-model-based semantic understanding, symbolic logical grounding, motion planning, and formal verification into an end-to-end interpretable and verifiable pipeline. Evaluated on 150 spatiotemporal instructions across five real-world scenarios, LIMP achieves a 79% success rate—significantly surpassing the strongest baseline (38%)—and establishes new state-of-the-art performance on open-vocabulary instruction-following tasks.

Technology Category

Application Category

📝 Abstract

When instructing robots, users want to flexibly express constraints, refer to arbitrary landmarks, and verify robot behavior, while robots must disambiguate instructions into specifications and ground instruction referents in the real world. To address this problem, we propose Language Instruction grounding for Motion Planning (LIMP), an approach that enables robots to verifiably follow complex, open-ended instructions in real-world environments without prebuilt semantic maps. LIMP constructs a symbolic instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of correct-by-construction robot behaviors. We conduct a large-scale evaluation of LIMP on 150 instructions across five real-world environments, demonstrating its versatility and ease of deployment in diverse, unstructured domains. LIMP performs comparably to state-of-the-art baselines on standard open-vocabulary tasks and additionally achieves a 79% success rate on complex spatiotemporal instructions, significantly outperforming baselines that only reach 38%. See supplementary materials and demo videos at https://robotlimp.github.io

Problem

Research questions and friction points this paper is trying to address.

Robots must disambiguate and ground complex user instructions.

Need flexible constraint expression and behavior verification for robots.

Requires real-world instruction following without prebuilt semantic maps.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses symbolic instruction representation for alignment

Synthesizes correct-by-construction robot behaviors

Operates without prebuilt semantic maps

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey