ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI-generated images (e.g., from Stable Diffusion) often suffer from geometric distortions due to inconsistent vanishing points—particularly detrimental to spatial realism in architectural scenes. To address this, we propose a user-guided geometric refinement framework that introduces, for the first time, an architecture-contour-based structural guidance mechanism and explicit geometric constraints to align edges and perspective cues, thereby ensuring global geometric consistency. Built upon a pre-trained diffusion model, our method integrates differentiable projective transformations, edge detection, and vanishing point estimation to enable interactive perspective correction. Extensive experiments demonstrate significant improvements in structural plausibility and visual fidelity of architectural imagery, substantially enhancing downstream tasks such as high-fidelity image-to-3D reconstruction. The source code and dataset are publicly available.

Technology Category

Application Category

📝 Abstract
Recent text-to-image models, such as Stable Diffusion, have achieved impressive visual quality, yet they often suffer from geometric inconsistencies that undermine the structural realism of generated scenes. One prominent issue is vanishing point inconsistency, where projections of parallel lines fail to converge correctly in 2D space. This leads to structurally implausible geometry that degrades spatial realism, especially in architectural scenes. We propose ControlVP, a user-guided framework for correcting vanishing point inconsistencies in generated images. Our approach extends a pre-trained diffusion model by incorporating structural guidance derived from building contours. We also introduce geometric constraints that explicitly encourage alignment between image edges and perspective cues. Our method enhances global geometric consistency while maintaining visual fidelity comparable to the baselines. This capability is particularly valuable for applications that require accurate spatial structure, such as image-to-3D reconstruction. The dataset and source code are available at https://github.com/RyotaOkumura/ControlVP .
Problem

Research questions and friction points this paper is trying to address.

Corrects vanishing point inconsistencies in AI-generated images
Enhances geometric consistency while preserving visual fidelity
Improves structural realism for applications like 3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends diffusion models with structural guidance from contours
Introduces geometric constraints aligning edges with perspective cues
Enhances global consistency while preserving visual fidelity
🔎 Similar Papers
No similar papers found.