🤖 AI Summary
Existing egocentric 3D datasets suffer from insufficient ground-truth 3D geometry, limited diurnal illumination variation, and incomplete 6-degree-of-freedom (6DoF) pose annotations, hindering robust evaluation of egocentric 3D perception. To address this, we introduce the first large-scale, ground-truth-complete, day-night dual-modal egocentric 3D dataset—capturing 30 km of trajectories across 40,000 m² of real-world scenes. It provides high-accuracy 6DoF camera poses, densely reconstructed point clouds, cross-temporal geometric alignment, and multi-epoch illumination labels. Leveraging Meta ARIA glasses, we acquire synchronized video streams and employ multi-session SLAM for precise pose estimation and dense reconstruction. The dataset enables two novel benchmarks: novel-view synthesis and visual relocalization under extreme lighting conditions. It fills a critical gap in egocentric 3D perception evaluation, significantly enhancing the capacity to validate model generalization and robustness in low-light and dynamically lit real-world scenarios.
📝 Abstract
We introduce Oxford Day-and-Night, a large-scale, egocentric dataset for novel view synthesis (NVS) and visual relocalisation under challenging lighting conditions. Existing datasets often lack crucial combinations of features such as ground-truth 3D geometry, wide-ranging lighting variation, and full 6DoF motion. Oxford Day-and-Night addresses these gaps by leveraging Meta ARIA glasses to capture egocentric video and applying multi-session SLAM to estimate camera poses, reconstruct 3D point clouds, and align sequences captured under varying lighting conditions, including both day and night. The dataset spans over 30 $mathrm{km}$ of recorded trajectories and covers an area of 40,000 $mathrm{m}^2$, offering a rich foundation for egocentric 3D vision research. It supports two core benchmarks, NVS and relocalisation, providing a unique platform for evaluating models in realistic and diverse environments.