Text2Stereo: Repurposing Stable Diffusion for Stereo Generation with Consistency Rewards

📅 2025-05-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The scarcity of large-baseline stereo image data hinders the training of dedicated diffusion models for text-driven stereo generation. To address this, this work pioneers the adaptation of Stable Diffusion to stereo image synthesis. We introduce a stereo consistency reward function that jointly optimizes disparity-based geometric consistency and text-image alignment. Our method integrates efficient LoRA-based fine-tuning, reinforcement learning with reward modeling, prompt alignment loss, and dual-view consistency constraints. Extensive experiments demonstrate that our approach generates high-fidelity, geometrically coherent stereo images across diverse scenes. Quantitative and qualitative evaluations show significant improvements over existing text-to-stereo methods. Moreover, the framework exhibits strong zero-shot generalization capability without task-specific retraining.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a novel diffusion-based approach to generate stereo images given a text prompt. Since stereo image datasets with large baselines are scarce, training a diffusion model from scratch is not feasible. Therefore, we propose leveraging the strong priors learned by Stable Diffusion and fine-tuning it on stereo image datasets to adapt it to the task of stereo generation. To improve stereo consistency and text-to-image alignment, we further tune the model using prompt alignment and our proposed stereo consistency reward functions. Comprehensive experiments demonstrate the superiority of our approach in generating high-quality stereo images across diverse scenarios, outperforming existing methods.
Problem

Research questions and friction points this paper is trying to address.

Generate stereo images from text prompts
Leverage Stable Diffusion for stereo generation
Improve stereo consistency and text alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune Stable Diffusion for stereo generation
Use stereo consistency reward functions
Improve text-to-image alignment via tuning
🔎 Similar Papers
No similar papers found.
A
Aakash Garg
Texas A&M University
Libing Zeng
Libing Zeng
Texas A&M University
Computer GraphicsComputer Vision
A
Andrii Tsarov
Leia Inc.
N
N. Kalantari
Texas A&M University