RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

πŸ“… 2026-04-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods struggle to simultaneously achieve portability, robustness to occlusion, and global consistency when capturing high-fidelity, long-duration human motion data in the wild. This work proposes a hybrid wearable system that fuses low-cost, sparse IMUs with Project Aria smart glasses, integrating first-person visual SLAM and human mesh reconstruction to enable stable estimation of full 3D pose and body shape in metric space. By leveraging the IMUs’ robustness to fast motion and occlusions alongside the visual SLAM’s global trajectory consistency over extended sequences, the method significantly outperforms existing first-person baselines on an agile motion dataset, matching the performance of state-of-the-art third-person approaches such as SAM3D. The system further demonstrates practical utility by enabling policy learning for real humanoid robots.
πŸ“ Abstract
Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perception. This system is motivated by the complementarity of the two sensors: IMUs provide robustness to occlusions and high-speed motions, while egocentric SLAM anchors long-horizon motion and stabilizes upper body pose. We collect a dataset of agile activities to evaluate RoSHI. On this dataset, we generally outperform other egocentric baselines and perform comparably to a state-of-the-art exocentric baseline (SAM3D). Finally, we demonstrate that the motion data recorded from our system are suitable for real-world humanoid policy learning. For videos, data and more, visit the project webpage: https://roshi-mocap.github.io/
Problem

Research questions and friction points this paper is trying to address.

human data collection
in-the-wild
robot learning
motion capture
wearable sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

wearable motion capture
egocentric perception
IMU-SLAM fusion
3D human pose estimation
robot learning from human data
πŸ”Ž Similar Papers
No similar papers found.
W
Wenjing Margaret Mao
Department of Electrical and Systems Engineering, University of Pennsylvania
J
Jefferson Ng
Department of Electrical and Systems Engineering, University of Pennsylvania
L
Luyang Hu
Department of Electrical and Systems Engineering, University of Pennsylvania
Daniel Gehrig
Daniel Gehrig
Postdoctoral researcher, GRASP Lab, University of Pennsylvania
Computer VisionDeep LearningEvent CamerasRobotics
Antonio Loquercio
Antonio Loquercio
Assistant Professor, University of Pennsylvania
RoboticsComputer VisionMachine LearningArtificial Intelligence