Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-resolution image generation, pre-trained U-Net architectures suffer from positional distortion in the latent space due to zero-padding in convolutions, leading to repetitive patterns and structural incoherence during denoising. This degradation stems from inconsistent positional encoding across scales. To address this, we propose Progressive Boundary Complement (PBC), a training-free method that dynamically constructs virtual image boundaries within feature maps to rectify positional information propagation and enhance cross-scale consistency. PBC integrates seamlessly into standard diffusion frameworks without modifying network architecture or requiring retraining. Experiments demonstrate that PBC significantly improves structural integrity and content richness in generated high-resolution images, outperforming existing alignment-based approaches on multi-scale synthesis tasks.

Technology Category

Application Category

📝 Abstract
Denoising higher-resolution latents via a pre-trained U-Net leads to repetitive and disordered image patterns. Although recent studies make efforts to improve generative quality by aligning denoising process across original and higher resolutions, the root cause of suboptimal generation is still lacking exploration. Through comprehensive analysis of position encoding in U-Net, we attribute it to inconsistent position encoding, sourced by the inadequate propagation of position information from zero-padding to latent features in convolution layers as resolution increases. To address this issue, we propose a novel training-free approach, introducing a Progressive Boundary Complement (PBC) method. This method creates dynamic virtual image boundaries inside the feature map to enhance position information propagation, enabling high-quality and rich-content high-resolution image synthesis. Extensive experiments demonstrate the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

Addresses repetitive patterns in high-resolution image generation
Explores inconsistent position encoding in U-Net denoising
Proposes training-free method for better position information propagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Boundary Complement enhances position encoding
Dynamic virtual boundaries improve image synthesis quality
Training-free method for high-resolution image generation
🔎 Similar Papers
No similar papers found.
F
Feng Zhou
Beijing University of Posts and Telecommunications
Pu Cao
Pu Cao
Beijing University of Posts and Telecommunications
Computer Vision
Yiyang Ma
Yiyang Ma
DeepSeek-AI
Generative ModelsLarge Language Models
L
Lu Yang
Beijing University of Posts and Telecommunications
J
Jianqin Yin
Beijing University of Posts and Telecommunications