π€ AI Summary
To address global spatial layout inconsistency in high-resolution panoramic image generation, this paper proposes a multi-scale diffusion framework that jointly models geometry and semantics via cross-resolution structural prior transfer. The core contribution is a novel multi-scale gradient-guided mechanism, which explicitly injects low-resolution layout constraints into the high-resolution generation process, integrated with multi-scale feature distillation, cross-resolution gradient backpropagation, and structure-aware loss. The method preserves seamless structural coherence and natural transitions even at 16K resolution. Quantitative experiments demonstrate a 23.6% reduction in FrΓ©chet Inception Distance (FID) and a 31.4% improvement in layout consistency metrics, significantly outperforming state-of-the-art approaches.
π Abstract
Diffusion models have recently gained recognition for generating diverse and high-quality content, especially in image synthesis. These models excel not only in creating fixed-size images but also in producing panoramic images. However, existing methods often struggle with spatial layout consistency when producing high-resolution panoramas due to the lack of guidance on the global image layout. This paper introduces the Multi-Scale Diffusion (MSD), an optimized framework that extends the panoramic image generation framework to multiple resolution levels. Our method leverages gradient descent techniques to incorporate structural information from low-resolution images into high-resolution outputs. Through comprehensive qualitative and quantitative evaluations against prior work, we demonstrate that our approach significantly improves the coherence of high-resolution panorama generation.