A Backbone for Long-Horizon Robot Task Understanding

📅 2024-08-02

🏛️ IEEE Robotics and Automation Letters

📈 Citations: 2

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address poor interpretability, weak generalization, and unpredictable outcomes of end-to-end learning in long-horizon robotic tasks, this paper proposes the Therblig-Based Backbone Framework (TBBF). First, it introduces a novel task decomposition paradigm grounded in Therblig motion primitives, enabling semantically interpretable temporal disentanglement. Second, it designs the Meta-RGate SynerFusion (MGSF) network and an Action Registration module to support one-shot generalization. Third, it integrates an LLM-Aligned Policy for Visual Correction (LAP-VC), which leverages vision-language feedback for online behavioral refinement. Experiments demonstrate a Therblig segmentation recall of 94.37%. On real robots, TBBF achieves task success rates of 94.4% in simple scenes and 80% in complex scenes—substantially improving robustness and cross-task generalization over prior end-to-end approaches.

Technology Category

Application Category

📝 Abstract

End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.

Problem

Research questions and friction points this paper is trying to address.

Improves interpretability and generalization in robot learning

Enhances data efficiency for long-horizon robotic tasks

Facilitates adaptive trajectory generation for new scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Therblig-Based Backbone Framework enhances robot task understanding

Meta-RGate SynerFusion network for accurate task segmentation

LLM-Alignment Policy ensures precise action registration

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments