A Unified Framework for Real-Time Failure Handling in Robotics Using Vision-Language Models, Reactive Planner and Behavior Trees

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robotic task execution frequently fails in real time due to environmental disturbances or sensor anomalies; conventional recovery approaches—relying on predefined policies or manual intervention—exhibit poor adaptability. This paper proposes a unified online failure recovery framework integrating vision-language models (VLMs), reactive planners, and behavior trees. Its core contribution is the first VLM-driven mechanism for scene understanding and skill generation, augmented by structured scene graph modeling and execution history tracking. This enables pre-execution condition checking, automatic prerequisite completion, and online synthesis of novel skills. Evaluated on an ABB YuMi robot and the AI2-THOR simulator, the framework significantly improves success rates across benchmark tasks—including peg insertion, object sorting, and drawer placement—while demonstrating superior robustness and dynamic adaptability compared to single-stage baseline methods.

Technology Category

Application Category

📝 Abstract
Robotic systems often face execution failures due to unexpected obstacles, sensor errors, or environmental changes. Traditional failure recovery methods rely on predefined strategies or human intervention, making them less adaptable. This paper presents a unified failure recovery framework that combines Vision-Language Models (VLMs), a reactive planner, and Behavior Trees (BTs) to enable real-time failure handling. Our approach includes pre-execution verification, which checks for potential failures before execution, and reactive failure handling, which detects and corrects failures during execution by verifying existing BT conditions, adding missing preconditions and, when necessary, generating new skills. The framework uses a scene graph for structured environmental perception and an execution history for continuous monitoring, enabling context-aware and adaptive failure handling. We evaluate our framework through real-world experiments with an ABB YuMi robot on tasks like peg insertion, object sorting, and drawer placement, as well as in AI2-THOR simulator. Compared to using pre-execution and reactive methods separately, our approach achieves higher task success rates and greater adaptability. Ablation studies highlight the importance of VLM-based reasoning, structured scene representation, and execution history tracking for effective failure recovery in robotics.
Problem

Research questions and friction points this paper is trying to address.

Real-time failure handling in robotics using Vision-Language Models.
Combining reactive planner and Behavior Trees for adaptive recovery.
Enhancing task success rates through context-aware failure detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Vision-Language Models with reactive planner
Uses Behavior Trees for real-time failure handling
Integrates scene graph and execution history tracking
🔎 Similar Papers
No similar papers found.
F
Faseeh Ahmad
Lund University, Lund, Sweden
H
Hashim Ismail
Lund University, Lund, Sweden
Jonathan Styrud
Jonathan Styrud
ABB, KTH
RoboticsControlPlanningAIethics
M
Maj Stenmark
Lund University, Lund, Sweden
Volker Krueger
Volker Krueger
Lund University
roboticscomputer visionmachine intelligenceimage processing