Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to efficiently learn multi-task manipulation in real-world environments using only a few human demonstrations (≤10). The authors propose a trajectory warping method grounded in semantic keypoint correspondences, which generates robust open-loop policies from limited demonstrations. Integrated with a vision-language model, this approach forms a closed-loop system for task selection, execution, and evaluation, facilitating hours-long autonomous functional play. To the best of the authors’ knowledge, this is the first demonstration of few-shot, long-horizon autonomous learning in the real world. The method significantly enhances policy generalization under spatial and semantic variations, ultimately achieving performance comparable to policies trained on large-scale human-collected datasets.

Technology Category

Application Category

📝 Abstract

The ability to conduct and learn from interaction and experience is a central challenge in robotics, offering a scalable alternative to labor-intensive human demonstrations. However, realizing such "play" requires (1) a policy robust to diverse, potentially out-of-distribution environment states, and (2) a procedure that continuously produces useful robot experience. To address these challenges, we introduce Tether, a method for autonomous functional play involving structured, task-directed interactions. First, we design a novel open-loop policy that warps actions from a small set of source demonstrations (<=10) by anchoring them to semantic keypoint correspondences in the target scene. We show that this design is extremely data-efficient and robust even under significant spatial and semantic variations. Second, we deploy this policy for autonomous functional play in the real world via a continuous cycle of task selection, execution, evaluation, and improvement, guided by the visual understanding capabilities of vision-language models. This procedure generates diverse, high-quality datasets with minimal human intervention. In a household-like multi-object setup, our method is the first to perform many hours of autonomous multi-task play in the real world starting from only a handful of demonstrations. This produces a stream of data that consistently improves the performance of closed-loop imitation policies over time, ultimately yielding over 1000 expert-level trajectories and training policies competitive with those learned from human-collected demonstrations.

Problem

Research questions and friction points this paper is trying to address.

autonomous play

robotic interaction

out-of-distribution robustness

data-efficient learning

experience generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory warping

autonomous play

semantic keypoint correspondence

vision-language models

data-efficient imitation learning

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)