SafeMimic: Towards Safe and Autonomous Human-to-Robot Imitation for Mobile Manipulation

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to autonomously and safely learn multi-step mobile manipulation tasks from a single third-person human demonstration video—without human supervision. We propose a progressive safe imitation learning framework featuring three novel mechanisms: (1) retrospective behavior re-planning, (2) morphology-aware action sampling, and (3) ensemble safety Q-functions—jointly enabling semantic-action parsing, egocentric viewpoint translation, morphology adaptation, and real-time safety verification. Our method integrates video segmentation with semantic change detection, third-person-to-egocentric viewpoint transformation, simulation-based safety-constrained modeling, and dynamic grasping with trajectory adjustment. Evaluated on seven cross-user, cross-environment tasks, it significantly outperforms state-of-the-art methods, achieving a +23.6% success rate under single-demonstration learning, markedly reduced failure rates, and automatic policy distillation. To our knowledge, this is the first approach to enable fully autonomous, robust execution and failure recovery without any human intervention.

Technology Category

Application Category

📝 Abstract

For robots to become efficient helpers in the home, they must learn to perform new mobile manipulation tasks simply by watching humans perform them. Learning from a single video demonstration from a human is challenging as the robot needs to first extract from the demo what needs to be done and how, translate the strategy from a third to a first-person perspective, and then adapt it to be successful with its own morphology. Furthermore, to mitigate the dependency on costly human monitoring, this learning process should be performed in a safe and autonomous manner. We present SafeMimic, a framework to learn new mobile manipulation skills safely and autonomously from a single third-person human video. Given an initial human video demonstration of a multi-step mobile manipulation task, SafeMimic first parses the video into segments, inferring both the semantic changes caused and the motions the human executed to achieve them and translating them to an egocentric reference. Then, it adapts the behavior to the robot's own morphology by sampling candidate actions around the human ones, and verifying them for safety before execution in a receding horizon fashion using an ensemble of safety Q-functions trained in simulation. When safe forward progression is not possible, SafeMimic backtracks to previous states and attempts a different sequence of actions, adapting both the trajectory and the grasping modes when required for its morphology. As a result, SafeMimic yields a strategy that succeeds in the demonstrated behavior and learns task-specific actions that reduce exploration in future attempts. Our experiments show that our method allows robots to safely and efficiently learn multi-step mobile manipulation behaviors from a single human demonstration, from different users, and in different environments, with improvements over state-of-the-art baselines across seven tasks

Problem

Research questions and friction points this paper is trying to address.

Enabling robots to learn mobile manipulation tasks from human videos

Ensuring safe and autonomous adaptation of human demonstrations to robot morphology

Reducing exploration needs by learning task-specific actions from single demonstrations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parses human video into semantic segments

Adapts actions to robot's morphology safely

Uses safety Q-functions for autonomous verification

🔎 Similar Papers

No similar papers found.