Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards

📅 2025-05-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

The scarcity of large-baseline stereo image data hinders the training of dedicated diffusion models for text-driven stereo generation. To address this, this work pioneers the adaptation of Stable Diffusion to stereo image synthesis. We introduce a stereo consistency reward function that jointly optimizes disparity-based geometric consistency and text-image alignment. Our method integrates efficient LoRA-based fine-tuning, reinforcement learning with reward modeling, prompt alignment loss, and dual-view consistency constraints. Extensive experiments demonstrate that our approach generates high-fidelity, geometrically coherent stereo images across diverse scenes. Quantitative and qualitative evaluations show significant improvements over existing text-to-stereo methods. Moreover, the framework exhibits strong zero-shot generalization capability without task-specific retraining.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a novel diffusion-based approach to generate stereo images given a text prompt. Since stereo image datasets with large baselines are scarce, training a diffusion model from scratch is not feasible. Therefore, we propose leveraging the strong priors learned by Stable Diffusion and fine-tuning it on stereo image datasets to adapt it to the task of stereo generation. To improve stereo consistency and text-to-image alignment, we further tune the model using prompt alignment and our proposed stereo consistency reward functions. Comprehensive experiments demonstrate the superiority of our approach in generating high-quality stereo images across diverse scenarios, outperforming existing methods.

Problem

Research questions and friction points this paper is trying to address.

Generate stereo images from text prompts

Leverage Stable Diffusion for stereo generation

Improve stereo consistency and text alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune Stable Diffusion for stereo generation

Use stereo consistency reward functions

Improve text-to-image alignment via tuning

🔎 Similar Papers

No similar papers found.