AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High cost, low efficiency, and hardware limitations of teleoperation hinder scalable human motion data collection in unstructured field environments. Method: We propose AirExo-2, a lightweight full-body exoskeleton enabling low-cost, extensible field motion capture, and introduce the first end-to-end “exoskeleton capture → pseudo-robot demonstration” pipeline via kinematic mapping and pseudo-demonstration generation. Further, we design RISE-2, a multimodal imitation learning framework integrating 2D/3D perception with few-shot cross-domain adaptation. Contribution/Results: Evaluated solely on field-collected human demonstrations, RISE-2 matches or surpasses conventional teleoperation-based training in both in-domain and cross-domain tasks—demonstrating superior generalization and robustness. This establishes a novel paradigm for embodied intelligence training without physical robot involvement.

Technology Category

Application Category

📝 Abstract
Scaling up imitation learning for real-world applications requires efficient and cost-effective demonstration collection methods. Current teleoperation approaches, though effective, are expensive and inefficient due to the dependency on physical robot platforms. Alternative data sources like in-the-wild demonstrations can eliminate the need for physical robots and offer more scalable solutions. However, existing in-the-wild data collection devices have limitations: handheld devices offer restricted in-hand camera observation, while whole-body devices often require fine-tuning with robot data due to action inaccuracies. In this paper, we propose AirExo-2, a low-cost exoskeleton system for large-scale in-the-wild demonstration collection. By introducing the demonstration adaptor to transform the collected in-the-wild demonstrations into pseudo-robot demonstrations, our system addresses key challenges in utilizing in-the-wild demonstrations for downstream imitation learning in real-world environments. Additionally, we present RISE-2, a generalizable policy that integrates 2D and 3D perceptions, outperforming previous imitation learning policies in both in-domain and out-of-domain tasks, even with limited demonstrations. By leveraging in-the-wild demonstrations collected and transformed by the AirExo-2 system, without the need for additional robot demonstrations, RISE-2 achieves comparable or superior performance to policies trained with teleoperated data, highlighting the potential of AirExo-2 for scalable and generalizable imitation learning. Project page: https://airexo.tech/airexo2
Problem

Research questions and friction points this paper is trying to address.

Develop low-cost exoskeleton for scalable imitation learning.
Transform in-the-wild demonstrations into pseudo-robot data.
Create generalizable policy integrating 2D and 3D perceptions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-cost exoskeleton for scalable demonstration collection
Demonstration adaptor transforms in-the-wild data
RISE-2 policy integrates 2D and 3D perceptions
🔎 Similar Papers
No similar papers found.
Hongjie Fang
Hongjie Fang
Shanghai Jiao Tong University
RoboticsRobot LearningRobotic Manipulation
C
Chenxi Wang
Shanghai Jiao Tong University, Shanghai Noematrix Intelligence Technology Ltd.
Y
Yiming Wang
Shanghai Jiao Tong University
Jingjing Chen
Jingjing Chen
Fudan University
MultimediaComputer VisionMachine LearningPattern recognition
Shangning Xia
Shangning Xia
Shanghai Jiao Tong University
Jun Lv
Jun Lv
Shanghai Jiao Tong University
Embodied AIRobot LearningArtificial Intelligence
Z
Zihao He
Shanghai Jiao Tong University
X
Xiyan Yi
Shanghai Jiao Tong University
Y
Yunhan Guo
Shanghai Jiao Tong University
Xinyu Zhan
Xinyu Zhan
Shanghai Jiao Tong University
L
Lixin Yang
Shanghai Jiao Tong University
W
Weiming Wang
Shanghai Jiao Tong University
C
Cewu Lu
Shanghai Jiao Tong University
Hao-Shu Fang
Hao-Shu Fang
Massachusetts Institute of Technology
Robotic ManipulationRobot LearningComputer Vision