Restereo: Diffusion stereo video generation and restoration

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing methods for stereo video generation rely on high-quality monocular input, rendering them ineffective for low-quality videos. This paper proposes the first end-to-end framework for stereo video generation and joint dual-view restoration tailored to degraded monocular videos, unifying stereo synthesis and collaborative inpainting within a single video diffusion model. Key innovations include optical-flow-guided view warping, warped mask conditioning, and degradation-aware fine-tuning—enabling effective training solely on small-scale synthetic data while generalizing robustly to real-world low-quality videos. Under low-resolution input conditions, our method significantly outperforms state-of-the-art approaches in quantitative metrics including SSIM and Stereo Consistency Score. To the best of our knowledge, it is the first to achieve high-fidelity, stereo-consistent video generation directly from severely degraded monocular inputs.

Technology Category

Application Category

📝 Abstract

Stereo video generation has been gaining increasing attention with recent advancements in video diffusion models. However, most existing methods focus on generating 3D stereoscopic videos from monocular 2D videos. These approaches typically assume that the input monocular video is of high quality, making the task primarily about inpainting occluded regions in the warped video while preserving disoccluded areas. In this paper, we introduce a new pipeline that not only generates stereo videos but also enhances both left-view and right-view videos consistently with a single model. Our approach achieves this by fine-tuning the model on degraded data for restoration, as well as conditioning the model on warped masks for consistent stereo generation. As a result, our method can be fine-tuned on a relatively small synthetic stereo video datasets and applied to low-quality real-world videos, performing both stereo video generation and restoration. Experiments demonstrate that our method outperforms existing approaches both qualitatively and quantitatively in stereo video generation from low-resolution inputs.

Problem

Research questions and friction points this paper is trying to address.

Generating and restoring stereo videos from low-quality inputs

Enhancing left-view and right-view videos consistently

Outperforming existing methods in low-resolution stereo generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning model on degraded data for restoration

Conditioning model on warped masks for consistency

Generating and restoring stereo videos with one model

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence