StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses monocular-to-stereoscopic image generation without explicit depth estimation or geometric warping. We propose an end-to-end diffusion-based paradigm that operates in a canonical rectified space, where view-conditioned embeddings directly model disparity distributions and occlusion-aware inpainting—enabling fully differentiable, parameterized training. To rigorously evaluate perceptual fidelity, we introduce a leakage-immune assessment protocol emphasizing downstream metrics: iSQoE (integrated Stereo Quality of Experience) and MEt3R (Multi-scale Edge-aware Temporal 3D Reconstruction error). Experiments demonstrate consistent superiority over warp-and-inpaint, latent-warping, and warped-conditioning baselines across both hierarchical and non-Lambertian scenes. Our method achieves state-of-the-art performance in disparity sharpness and geometric consistency, marking the first successful realization of high-fidelity stereo synthesis without depth prediction or explicit warping operations.

Technology Category

Application Category

📝 Abstract
We introduce StereoSpace, a diffusion-based framework for monocular-to-stereo synthesis that models geometry purely through viewpoint conditioning, without explicit depth or warping. A canonical rectified space and the conditioning guide the generator to infer correspondences and fill disocclusions end-to-end. To ensure fair and leakage-free evaluation, we introduce an end-to-end protocol that excludes any ground truth or proxy geometry estimates at test time. The protocol emphasizes metrics reflecting downstream relevance: iSQoE for perceptual comfort and MEt3R for geometric consistency. StereoSpace surpasses other methods from the warp&inpaint, latent-warping, and warped-conditioning categories, achieving sharp parallax and strong robustness on layered and non-Lambertian scenes. This establishes viewpoint-conditioned diffusion as a scalable, depth-free solution for stereo generation.
Problem

Research questions and friction points this paper is trying to address.

Generates stereo images without depth data
Ensures geometric consistency and perceptual comfort
Outperforms existing methods in robustness and quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based monocular-to-stereo synthesis without depth
Canonical rectified space guides end-to-end correspondence inference
Viewpoint-conditioned diffusion eliminates explicit warping and geometry estimates
🔎 Similar Papers
No similar papers found.