Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor policy generalization and low data efficiency in robotic manipulation caused by the visual discrepancies between simulation and reality (sim-to-real) and the embodiment mismatch between humans and robots (human-to-robot). To bridge these dual gaps, we propose SimHum, a co-training framework that systematically integrates kinematic priors from simulated robot trajectories with visual priors derived from real human demonstrations. Through joint training, SimHum effectively aligns these complementary sources of information. Under identical data budgets, SimHum achieves up to a 40% performance improvement over baselines. Notably, with only 80 real human demonstrations, it attains a 62.5% success rate on out-of-domain tasks—surpassing purely real-data baselines by a factor of 7.1.

Technology Category

Application Category

📝 Abstract
Synthetic simulation data and real-world human data provide scalable alternatives to circumvent the prohibitive costs of robot data collection. However, these sources suffer from the sim-to-real visual gap and the human-to-robot embodiment gap, respectively, which limits the policy's generalization to real-world scenarios. In this work, we identify a natural yet underexplored complementarity between these sources: simulation offers the robot action that human data lacks, while human data provides the real-world observation that simulation struggles to render. Motivated by this insight, we present SimHum, a co-training framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations. Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks. Empirically, SimHum outperforms the baseline by up to $\mathbf{40\%}$ under the same data collection budget, and achieves a $\mathbf{62.5\%}$ OOD success with only 80 real data, outperforming the real only baseline by $7.1\times$. Videos and additional information can be found at \href{https://kaipengfang.github.io/sim-and-human}{project website}.
Problem

Research questions and friction points this paper is trying to address.

sim-to-real gap
embodiment gap
robotic manipulation
data efficiency
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-training
sim-to-real gap
embodiment gap
kinematic prior
visual prior
K
Kaipeng Fang
University of Electronic Science and Technology of China
W
Weiqing Liang
University of Electronic Science and Technology of China
Yuyang Li
Yuyang Li
Institute for AI, Peking University
Robotic ManipulationTactile SensingHuman-Object Interaction
J
Ji Zhang
Southwest Jiaotong University
Pengpeng Zeng
Pengpeng Zeng
Tongji University
computer vision
Lianli Gao
Lianli Gao
UESTC
Vision and Language
H
Heng Tao Shen
Tongji University
J
Jingkuan Song
Tongji University, Shanghai Innovation Institute