🤖 AI Summary
Existing dynamic-scene 4D reconstruction lacks support from multi-view egocentric data. To address this, we introduce the first multi-view selfie video dataset tailored for dynamic social scenarios—covering five real-world settings (e.g., conferences, performances)—each captured simultaneously by five synchronized AR glasses with sub-millisecond temporal alignment and high-precision pose annotations. We propose a custom hardware synchronization system for AR-glass arrays, a unified pipeline for multi-camera calibration and pose estimation, and a comprehensive evaluation framework for 4D reconstruction and free-viewpoint video (FVV) generation. Experiments demonstrate that our dataset and methodology significantly outperform existing baselines on FVV synthesis, effectively bridging the gap in both data and methodology for multi-view egocentric reconstruction of dynamic social interactions. This work establishes the first reproducible benchmark and releases all resources—including data, code, and models—as open-source.
📝 Abstract
Multi-view egocentric dynamic scene reconstruction holds significant research value for applications in holographic documentation of social interactions. However, existing reconstruction datasets focus on static multi-view or single-egocentric view setups, lacking multi-view egocentric datasets for dynamic scene reconstruction. Therefore, we present MultiEgo, the first multi-view egocentric dataset for 4D dynamic scene reconstruction. The dataset comprises five canonical social interaction scenes: meetings, performances, and a presentation. Each scene provides five authentic egocentric videos captured by participants wearing AR glasses. We design a hardware-based data acquisition system and processing pipeline, achieving sub-millisecond temporal synchronization across views, coupled with accurate pose annotations. Experiment validation demonstrates the practical utility and effectiveness of our dataset for free-viewpoint video (FVV) applications, establishing MultiEgo as a foundational resource for advancing multi-view egocentric dynamic scene reconstruction research.