StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address error propagation, depth ambiguity, and incompatibility with both parallel and convergent stereo formats in monocular-to-stereoscopic video conversion—limitations inherent to conventional Depth-Warp-Inpaint (DWI) pipelines—this paper proposes an end-to-end, single-stage stereoscopic synthesis framework. Key contributions include: (i) UniStereo, the first large-scale, unified dataset covering both stereo formats; (ii) a lightweight, feed-forward architecture that bypasses explicit depth estimation and iterative sampling, incorporating a learnable domain switcher for automatic format adaptation; and (iii) joint optimization via prior-guided modeling, domain-adaptive networks, and cycle-consistency constraints to enhance geometric coherence and texture fidelity. Extensive experiments demonstrate that our method achieves state-of-the-art performance in visual quality and inference speed, enabling real-time, high-fidelity generation of both parallel and convergent stereo pairs.

Technology Category

Application Category

📝 Abstract
The rapid growth of stereoscopic displays, including VR headsets and 3D cinemas, has led to increasing demand for high-quality stereo video content. However, producing 3D videos remains costly and complex, while automatic Monocular-to-Stereo conversion is hindered by the limitations of the multi-stage ``Depth-Warp-Inpaint''(DWI) pipeline. This paradigm suffers from error propagation, depth ambiguity, and format inconsistency between parallel and converged stereo configurations. To address these challenges, we introduce UniStereo, the first large-scale unified dataset for stereo video conversion, covering both stereo formats to enable fair benchmarking and robust model training. Building upon this dataset, we propose StereoPilot, an efficient feed-forward model that directly synthesizes the target view without relying on explicit depth maps or iterative diffusion sampling. Equipped with a learnable domain switcher and a cycle consistency loss, StereoPilot adapts seamlessly to different stereo formats and achieves improved consistency. Extensive experiments demonstrate that StereoPilot significantly outperforms state-of-the-art methods in both visual fidelity and computational efficiency. Project page: https://hit-perfect.github.io/StereoPilot/.
Problem

Research questions and friction points this paper is trying to address.

Addresses high-quality stereo video conversion from monocular inputs.
Overcomes limitations of multi-stage depth-warp-inpaint pipeline errors.
Unifies stereo formats for consistent and efficient view synthesis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified dataset covering both stereo formats
Feed-forward model synthesizes target view directly
Learnable domain switcher adapts to different formats