EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing egocentric vision research predominantly relies on head-mounted cameras, limiting effective modeling of occluded lower-body motions. Method: We propose a novel paradigm using multi-position body-worn cameras (e.g., on arms and legs) and introduce EgoSim—the first high-fidelity egocentric simulator—alongside the real-world MultiEgoView dataset. Our approach uniquely integrates AMASS motion-capture data with neural rendering to jointly model motion artifacts, lower-limb occlusions, and motion blur. We collect 119 hours of synthetic and 5 hours of real synchronized six-view video–3D pose data. Contribution/Results: Our method substantially narrows the simulation-to-reality domain gap; end-to-end trained 3D pose estimation models achieve significant performance gains on real body-worn videos. Both code and datasets are publicly released, advancing egocentric human motion understanding beyond head-mounted setups.

Technology Category

Application Category

📝 Abstract
Research on egocentric tasks in computer vision has mostly focused on head-mounted cameras, such as fisheye cameras or embedded cameras inside immersive headsets. We argue that the increasing miniaturization of optical sensors will lead to the prolific integration of cameras into many more body-worn devices at various locations. This will bring fresh perspectives to established tasks in computer vision and benefit key areas such as human motion tracking, body pose estimation, or action recognition -- particularly for the lower body, which is typically occluded. In this paper, we introduce EgoSim, a novel simulator of body-worn cameras that generates realistic egocentric renderings from multiple perspectives across a wearer's body. A key feature of EgoSim is its use of real motion capture data to render motion artifacts, which are especially noticeable with arm- or leg-worn cameras. In addition, we introduce MultiEgoView, a dataset of egocentric footage from six body-worn cameras and ground-truth full-body 3D poses during several activities: 119 hours of data are derived from AMASS motion sequences in four high-fidelity virtual environments, which we augment with 5 hours of real-world motion data from 13 participants using six GoPro cameras and 3D body pose references from an Xsens motion capture suit. We demonstrate EgoSim's effectiveness by training an end-to-end video-only 3D pose estimation network. Analyzing its domain gap, we show that our dataset and simulator substantially aid training for inference on real-world data. EgoSim code&MultiEgoView dataset: https://siplab.org/projects/EgoSim
Problem

Research questions and friction points this paper is trying to address.

Simulates body-worn cameras perspectives
Generates realistic egocentric motion renderings
Augments 3D pose estimation datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates body-worn cameras
Uses real motion capture data
Generates multi-perspective egocentric renderings
🔎 Similar Papers
2024-08-30AAAI Conference on Artificial IntelligenceCitations: 5
D
Dominik Hollidt
Department of Computer Science, ETH Zürich, Switzerland
Paul Streli
Paul Streli
PhD student, ETH Zurich
Computer VisionMachine LearningHuman-Computer Interaction
J
Jiaxi Jiang
Department of Computer Science, ETH Zürich, Switzerland
Y
Yasaman Haghighi
Department of Computer Science, ETH Zürich, Switzerland
C
Changlin Qian
Department of Computer Science, ETH Zürich, Switzerland
X
Xintong Liu
Department of Computer Science, ETH Zürich, Switzerland
Christian Holz
Christian Holz
Associate Professor, ETH Zurich
Mixed RealityPerceptionHuman-Computer InteractionDigital Biomarkers