SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

📅 2024-07-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing panoramic image generation methods (e.g., MultiDiffusion) rely on multi-path fusion and iterative denoising, resulting in high computational overhead and slow inference—hindering simultaneous achievement of high resolution and global consistency. To address this, we propose a temporal sliding non-overlapping denoising window mechanism that replaces conventional averaging-based fusion with dynamic temporal stitching: at each sampling step, inter-tile seams are corrected in real time via temporally aware seam alignment. We further introduce a local–global consistency constraint to preserve structural coherence across tile boundaries. Built upon pretrained diffusion models, our approach integrates sliding-window scheduling with temporal-dependency-aware denoising, enabling seamless, high-fidelity panoramic synthesis with fewer sampling steps. Experiments demonstrate that our method achieves 2.1–3.8× faster inference and a 12% reduction in FID compared to state-of-the-art baselines, while delivering superior visual seamlessness and fine-grained detail fidelity.

Technology Category

Application Category

📝 Abstract

Generating high-resolution images with generative models has recently been made widely accessible by leveraging diffusion models pre-trained on large-scale datasets. Various techniques, such as MultiDiffusion and SyncDiffusion, have further pushed image generation beyond training resolutions, i.e., from square images to panorama, by merging multiple overlapping diffusion paths or employing gradient descent to maintain perceptual coherence. However, these methods suffer from significant computational inefficiencies due to generating and averaging numerous predictions, which is required in practice to produce high-quality and seamless images. This work addresses this limitation and presents a novel approach that eliminates the need to generate and average numerous overlapping denoising predictions. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality. Link to code https://github.com/stanifrolov/spotdiffusion

Problem

Research questions and friction points this paper is trying to address.

Square Image Conversion

High Computational Cost

Low Speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpotDiffusion

Efficiency Enhancement

Image Coherence

🔎 Similar Papers

Mixed-View Panorama Synthesis using Geospatially Guided Diffusion