🤖 AI Summary
To address poor interpretability, weak generalization, and unpredictable outcomes of end-to-end learning in long-horizon robotic tasks, this paper proposes the Therblig-Based Backbone Framework (TBBF). First, it introduces a novel task decomposition paradigm grounded in Therblig motion primitives, enabling semantically interpretable temporal disentanglement. Second, it designs the Meta-RGate SynerFusion (MGSF) network and an Action Registration module to support one-shot generalization. Third, it integrates an LLM-Aligned Policy for Visual Correction (LAP-VC), which leverages vision-language feedback for online behavioral refinement. Experiments demonstrate a Therblig segmentation recall of 94.37%. On real robots, TBBF achieves task success rates of 94.4% in simple scenes and 80% in complex scenes—substantially improving robustness and cross-task generalization over prior end-to-end approaches.
📝 Abstract
End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.