VLM-driven Behavior Tree for Context-aware Task Planning

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address poor adaptability and rigid condition specification in robot vision task planning within unstructured environments, this paper proposes a Vision-Language Model (VLM)-driven behavior tree framework. Our core innovation is the first-ever self-prompting visual condition node: free-text visual conditions are embedded directly into the behavior tree, and multimodal VLMs (e.g., LLaVA, Qwen-VL) perform real-time vision–language alignment to dynamically evaluate condition truth values, enabling context-aware online planning and on-the-fly task editing. Integrating dynamic prompt engineering with real-time visual reasoning, the framework achieves end-to-end deployment in a real-world café setting, validating a closed-loop workflow—from vision-semantic-driven task generation and editing to execution. Experiments demonstrate substantial improvements in task generalization capability and operational robustness for robots operating in complex, dynamic environments.

Technology Category

Application Category

📝 Abstract

The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.

Problem

Research questions and friction points this paper is trying to address.

Robotics

Visual Information Processing

Decision Making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Language Models

Dynamic Task Planning

Adaptive Decision Making

🔎 Similar Papers

LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning