🤖 AI Summary
To address insufficient rolling-shutter distortion modeling and limited VIO trajectory accuracy—constrained by pixel-level reconstruction requirements—in first-person HDR scene reconstruction, this paper proposes a high-fidelity, physics-aware reconstruction framework. First, it introduces visual-inertial bundle adjustment (VIBA) for millisecond-scale joint time-motion calibration of RGB cameras, enabling precise rolling-shutter compensation. Second, it integrates a physically grounded imaging model into Gaussian Splatting to jointly characterize multi-exposure HDR capture and sensor response. Evaluated on both Project Aria and Meta Quest 3 platforms across diverse indoor/outdoor scenes under varying illumination, the method achieves a 2 dB PSNR improvement: +1 dB from VIBA and +1 dB from the physics-based model. All code, datasets, and acquisition configurations are publicly released.
📝 Abstract
In this paper, we investigate the challenges associated with using egocentric devices to photorealistic reconstruct the scene in high dynamic range. Existing methodologies typically assume using frame-rate 6DoF pose estimated from the device's visual-inertial odometry system, which may neglect crucial details necessary for pixel-accurate reconstruction. This study presents two significant findings. Firstly, in contrast to mainstream work treating RGB camera as global shutter frame-rate camera, we emphasize the importance of employing visual-inertial bundle adjustment (VIBA) to calibrate the precise timestamps and movement of the rolling shutter RGB sensing camera in a high frequency trajectory format, which ensures an accurate calibration of the physical properties of the rolling-shutter camera. Secondly, we incorporate a physical image formation model based into Gaussian Splatting, which effectively addresses the sensor characteristics, including the rolling-shutter effect of RGB cameras and the dynamic ranges measured by sensors. Our proposed formulation is applicable to the widely-used variants of Gaussian Splats representation. We conduct a comprehensive evaluation of our pipeline using the open-source Project Aria device under diverse indoor and outdoor lighting conditions, and further validate it on a Meta Quest3 device. Across all experiments, we observe a consistent visual enhancement of +1 dB in PSNR by incorporating VIBA, with an additional +1 dB achieved through our proposed image formation model. Our complete implementation, evaluation datasets, and recording profile are available at http://www.projectaria.com/photoreal-reconstruction/