Robot Learning with Super-Linear Scaling

📅 2024-12-02
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the sublinear scaling bottleneck between human effort and task performance in robot learning. We propose CASHER, a data pipeline that leverages crowdsourced 3D scene reconstruction to build digital twins, enabling large-scale behavioral data collection in simulation—achieving superlinear scaling (“human effort ↓, performance ↑”). To our knowledge, this is the first demonstration of a superlinear relationship between human input and robotic task performance. CASHER introduces a simulation-driven “human-to-model” demonstration substitution mechanism, supporting zero-shot or few-shot generalization across unseen scenes. Technically, it integrates photogrammetric 3D reconstruction, digital twin construction, RL initialization, universal policy pretraining, sim-to-real closed-loop transfer, and video-scan-driven fine-tuning. We validate the scaling law on three real-robot tasks: policy adaptation requires only a single-scene video scan—no manual annotation or human-in-the-loop interaction.

Technology Category

Application Category

📝 Abstract
Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. The key idea is to crowdsource digital twins of real-world scenes using 3D reconstruction and collect large-scale data in simulation, rather than the real-world. Data collection in simulation is initially driven by RL, bootstrapped with human demonstrations. As the training of a generalist policy progresses across environments, its generalization capabilities can be used to replace human effort with model generated demonstrations. This results in a pipeline where behavioral data is collected in simulation with continually reducing human effort. We show that CASHER demonstrates zero-shot and few-shot scaling laws on three real-world tasks across diverse scenarios. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort. See our project website: https://casher-robot-learning.github.io/CASHER/
Problem

Research questions and friction points this paper is trying to address.

Scaling robot data collection superlinearly with human effort
Crowdsourcing digital twins for simulation-based learning
Reducing human involvement through model-generated demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Crowdsourcing digital twins via 3D reconstruction
Collecting simulation data using RL and demonstrations
Reducing human effort with model-generated demonstrations
🔎 Similar Papers
No similar papers found.
M
M. Torné
Massachusets Institute of Technology
Arhan Jain
Arhan Jain
University of Washington
Jiayi Yuan
Jiayi Yuan
Rice University
Machine LearningLarge Language Models
V
Vidaaranya Macha
University of Washington
L
Lars Ankile
Massachusets Institute of Technology
A
A. Simeonov
Massachusets Institute of Technology
Pulkit Agrawal
Pulkit Agrawal
Massachusetts Institute of Technology
RoboticsComputer VisionArtificial IntelligenceReinforcement Learning
A
Abhishek Gupta
University of Washington