🤖 AI Summary
To address error propagation, depth ambiguity, and incompatibility with both parallel and convergent stereo formats in monocular-to-stereoscopic video conversion—limitations inherent to conventional Depth-Warp-Inpaint (DWI) pipelines—this paper proposes an end-to-end, single-stage stereoscopic synthesis framework. Key contributions include: (i) UniStereo, the first large-scale, unified dataset covering both stereo formats; (ii) a lightweight, feed-forward architecture that bypasses explicit depth estimation and iterative sampling, incorporating a learnable domain switcher for automatic format adaptation; and (iii) joint optimization via prior-guided modeling, domain-adaptive networks, and cycle-consistency constraints to enhance geometric coherence and texture fidelity. Extensive experiments demonstrate that our method achieves state-of-the-art performance in visual quality and inference speed, enabling real-time, high-fidelity generation of both parallel and convergent stereo pairs.
📝 Abstract
The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint''(DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with a learnable domain switcher and a cycle consistency loss, StereoPilot adapts seamlessly to different stereo formats and achieves improved consistency. Extensive experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency. Project page: https://hit-perfect.github.io/StereoPilot/.