🤖 AI Summary
This work addresses the sublinear scaling bottleneck between human effort and task performance in robot learning. We propose CASHER, a data pipeline that leverages crowdsourced 3D scene reconstruction to build digital twins, enabling large-scale behavioral data collection in simulation—achieving superlinear scaling (“human effort ↓, performance ↑”). To our knowledge, this is the first demonstration of a superlinear relationship between human input and robotic task performance. CASHER introduces a simulation-driven “human-to-model” demonstration substitution mechanism, supporting zero-shot or few-shot generalization across unseen scenes. Technically, it integrates photogrammetric 3D reconstruction, digital twin construction, RL initialization, universal policy pretraining, sim-to-real closed-loop transfer, and video-scan-driven fine-tuning. We validate the scaling law on three real-robot tasks: policy adaptation requires only a single-scene video scan—no manual annotation or human-in-the-loop interaction.
📝 Abstract
Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. The key idea is to crowdsource digital twins of real-world scenes using 3D reconstruction and collect large-scale data in simulation, rather than the real-world. Data collection in simulation is initially driven by RL, bootstrapped with human demonstrations. As the training of a generalist policy progresses across environments, its generalization capabilities can be used to replace human effort with model generated demonstrations. This results in a pipeline where behavioral data is collected in simulation with continually reducing human effort. We show that CASHER demonstrates zero-shot and few-shot scaling laws on three real-world tasks across diverse scenarios. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort. See our project website: https://casher-robot-learning.github.io/CASHER/