ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing real-to-sim-to-real approaches, which predominantly focus on visual transfer and struggle to effectively model dynamic physical interactions in the real world—particularly hindering data generation for contact-rich manipulation tasks. To overcome this, the authors propose the ExoGS framework, which leverages a custom passive exoskeleton, AirExo-3, to synchronously capture high-fidelity human motion and RGB images. These recordings are reconstructed into editable 4D dynamic 3D Gaussian Splatting scenes, enabling geometrically consistent replay and large-scale data augmentation in simulation. By integrating a lightweight semantic adapter, the method achieves, for the first time, seamless transfer of real-world dynamic interactions into simulation. The approach significantly outperforms conventional teleoperation baselines, enhancing both data efficiency and cross-domain generalization of policies in real-world settings. Code and hardware designs are publicly released.

Technology Category

Application Category

📝 Abstract
Real-to-Sim-to-Real technique is gaining increasing interest for robotic manipulation, as it can generate scalable data in simulation while having narrower sim-to-real gap. However, previous methods mainly focused on environment-level visual real-to-sim transfer, ignoring the transfer of interactions, which could be challenging and inefficient to obtain purely in simulation especially for contact-rich tasks. We propose ExoGS, a robot-free 4D Real-to-Sim-to-Real framework that captures both static environments and dynamic interactions in the real world and transfers them seamlessly to a simulated environment. It provides a new solution for scalable manipulation data collection and policy learning. ExoGS employs a self-designed robot-isomorphic passive exoskeleton AirExo-3 to capture kinematically consistent trajectories with millimeter-level accuracy and synchronized RGB observations during direct human demonstrations. The robot, objects, and environment are reconstructed as editable 3D Gaussian Splatting assets, enabling geometry-consistent replay and large-scale data augmentation. Additionally, a lightweight Mask Adapter injects instance-level semantics into the policy to enhance robustness under visual domain shifts. Real-world experiments demonstrate that ExoGS significantly improves data efficiency and policy generalization compared to teleoperation-based baselines. Code and hardware files have been released on https://github.com/zaixiabalala/ExoGS.
Problem

Research questions and friction points this paper is trying to address.

Real-to-Sim-to-Real
robotic manipulation
dynamic interaction transfer
contact-rich tasks
scalable data collection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-to-Sim-to-Real
4D manipulation data
3D Gaussian Splatting
passive exoskeleton
Mask Adapter
🔎 Similar Papers
No similar papers found.
Yiming Wang
Yiming Wang
Shanghai Jiao Tong University
Large Language ModelsComplex ReasoningAI Interpretability
R
Ruogu Zhang
Shanghai Jiao Tong University
M
Minyang Li
Shanghai Jiao Tong University
H
Hao Shi
Shanghai Jiao Tong University
J
Junbo Wang
Shanghai Jiao Tong University
D
Deyi Li
Shanghai Jiao Tong University
J
Jieji Ren
Shanghai Jiao Tong University
Wenhai Liu
Wenhai Liu
Shanghai Jiao Tong University
deep learningrobotic grasping
W
Weiming Wang
Shanghai Jiao Tong University
Hao-Shu Fang
Hao-Shu Fang
Massachusetts Institute of Technology
Robotic ManipulationRobot LearningComputer Vision