🤖 AI Summary
To address the lack of multi-view constraints and difficulty in handling dynamic scenes in monocular 3D reconstruction, this paper proposes a mirror-reflection-based virtual stereo reconstruction method. It models specular reflections as physically interpretable virtual cameras, derives their poses via mirror geometry, and synthesizes geometrically consistent virtual views directly in the pixel domain. A symmetry-aware loss function is introduced to jointly optimize virtual camera poses and depth estimation, enabling high-fidelity 3D reconstruction from a single input image. The method supports frame-wise geometric recovery for both static and dynamic scenes without requiring auxiliary sensors or scene-specific priors. Evaluated on 16 custom Blender-synthetic scenes and real-world data, it demonstrates significant improvements in generalizability and robustness over existing monocular approaches.
📝 Abstract
Mirror reflections are common in everyday environments and can provide stereo information within a single capture, as the real and reflected virtual views are visible simultaneously. We exploit this property by treating the reflection as an auxiliary view and designing a transformation that constructs a physically valid virtual camera, allowing direct pixel-domain generation of the virtual view while adhering to the real-world imaging process. This enables a multi-view stereo setup from a single image, simplifying the imaging process, making it compatible with powerful feed-forward reconstruction models for generalizable and robust 3D reconstruction. To further exploit the geometric symmetry introduced by mirrors, we propose a symmetric-aware loss to refine pose estimation. Our framework also naturally extends to dynamic scenes, where each frame contains a mirror reflection, enabling efficient per-frame geometry recovery. For quantitative evaluation, we provide a fully customizable synthetic dataset of 16 Blender scenes, each with ground-truth point clouds and camera poses. Extensive experiments on real-world data and synthetic data are conducted to illustrate the effectiveness of our method.