🤖 AI Summary
To address the challenges of high demonstration-sample requirements and poor cross-task generalization in robotic imitation learning, this paper proposes Multi-Task Trajectory Transfer (MT3). The method innovatively decomposes manipulation trajectories into two semantically distinct phases—“alignment” and “interaction”—and integrates retrieval-augmented learning to enable cross-task and cross-object knowledge transfer from minimal demonstrations (one per task). MT3 unifies trajectory decomposition, retrieval-augmented learning, contrastive behavior cloning, and modular policy modeling. Evaluated on a real-world robot platform, MT3 achieves 1,000 diverse daily tasks using less than 24 hours of human demonstration data. Its data efficiency improves by an order of magnitude over conventional approaches, significantly advancing few-shot, multi-task imitation learning.
📝 Abstract
Humans are remarkably efficient at learning tasks from demonstrations, but today's imitation learning methods for robot manipulation often require hundreds or thousands of demonstrations per task. We investigated two fundamental priors for improving learning efficiency: decomposing manipulation trajectories into sequential alignment and interaction phases and retrieval-based generalization. Through 3450 real-world rollouts, we systematically studied this decomposition. We compared different design choices for the alignment and interaction phases and examined generalization and scaling trends relative to today's dominant paradigm of behavioral cloning with a single-phase monolithic policy. In the few-demonstrations-per-task regime (<10 demonstrations), decomposition achieved an order of magnitude of improvement in data efficiency over single-phase learning, with retrieval consistently outperforming behavioral cloning for both alignment and interaction. Building on these insights, we developed Multi-Task Trajectory Transfer (MT3), an imitation learning method based on decomposition and retrieval. MT3 learns everyday manipulation tasks from as little as a single demonstration each while also generalizing to previously unseen object instances. This efficiency enabled us to teach a robot 1000 distinct everyday tasks in under 24 hours of human demonstrator time. Through 2200 additional real-world rollouts, we reveal MT3's capabilities and limitations across different task families.