Human Demonstrations are Generalizable Knowledge for Robots

📅 2023-12-05
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning approaches decompose human demonstration videos into raw action sequences, limiting cross-task and cross-object generalization. Method: We propose a hierarchical knowledge distillation framework that extracts three levels of generalizable knowledge from videos: low-level observational representations, mid-level action structures, and high-level task–object patterns. We further design a knowledge-retrieval-augmented LLM-based planner integrated with a closed-loop policy execution module, enabling knowledge-aware reasoning and feedback-driven correction. Contribution/Results: This work introduces the first method to elevate human demonstrations into structured, transferable, general-purpose knowledge. In real-robot experiments across multiple tasks, our approach achieves substantial improvements in cross-instance generalization success rates using only a few demonstrations—effectively overcoming a fundamental generalization bottleneck in imitation learning.
📝 Abstract
Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which poses obstacles to generalization to diverse tasks or object instances. In this paper, we propose a different perspective, considering human demonstration videos not as mere instructions, but as a source of knowledge for robots. Motivated by this perspective and the remarkable comprehension and generalization capabilities exhibited by large language models (LLMs), we propose DigKnow, a method that DIstills Generalizable KNOWledge with a hierarchical structure. Specifically, DigKnow begins by converting human demonstration video frames into observation knowledge. This knowledge is then subjected to analysis to extract human action knowledge and further distilled into pattern knowledge compassing task and object instances, resulting in the acquisition of generalizable knowledge with a hierarchical structure. In settings with different tasks or object instances, DigKnow retrieves relevant knowledge for the current task and object instances. Subsequently, the LLM-based planner conducts planning based on the retrieved knowledge, and the policy executes actions in line with the plan to achieve the designated task. Utilizing the retrieved knowledge, we validate and rectify planning and execution outcomes, resulting in a substantial enhancement of the success rate. Experimental results across a range of tasks and scenes demonstrate the effectiveness of this approach in facilitating real-world robots to accomplish tasks with the knowledge derived from human demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Generalizing robotic learning from human demonstrations across diverse tasks
Converting video frames into hierarchical knowledge for robot comprehension
Enhancing task success via LLM-based planning and knowledge retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical knowledge distillation from human videos
LLM-based planning with retrieved knowledge
Validation and rectification of execution outcomes
🔎 Similar Papers
No similar papers found.
Guangyan Chen
Guangyan Chen
Beijing Institute of Technology
Te Cui
Te Cui
Beijing Institute of Technology
Embodied AI
T
Tianxing Zhou
School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Z
Zicai Peng
School of Automation, Beijing Institute of Technology, Beijing, 100081, China
M
Mengxiao Hu
School of Automation, Beijing Institute of Technology, Beijing, 100081, China
M
Meiling Wang
School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Y
Yi Yang
School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Y
Yufeng Yue
School of Automation, Beijing Institute of Technology, Beijing, 100081, China