Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based super-resolution methods suffer from structural distortions and content repetition when generating images exceeding 1K resolution; reference-guided upsampling introduces manifold bias in latent space and yields blurry details in RGB space. This paper proposes a synergistic framework comprising Latent-Space Super-Resolution (LSR) and Region-adaptive Noise Injection (RNA): LSR directly models super-resolution reconstruction within the diffusion latent space, ensuring structural consistency via manifold alignment optimization; RNA enhances high-frequency detail fidelity through spatially adaptive noise injection. Crucially, the method operates entirely in latent space without RGB-domain post-processing, thereby avoiding manifold mismatch and excessive smoothing. Extensive evaluations on multi-resolution benchmarks demonstrate consistent superiority over state-of-the-art reference-based approaches, with significant improvements in edge sharpness and texture preservation. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In this paper, we propose LSRNA, a novel framework for higher-resolution (exceeding 1K) image generation using diffusion models by leveraging super-resolution directly in the latent space. Existing diffusion models struggle with scaling beyond their training resolutions, often leading to structural distortions or content repetition. Reference-based methods address the issues by upsampling a low-resolution reference to guide higher-resolution generation. However, they face significant challenges: upsampling in latent space often causes manifold deviation, which degrades output quality. On the other hand, upsampling in RGB space tends to produce overly smoothed outputs. To overcome these limitations, LSRNA combines Latent space Super-Resolution (LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhance high-frequency details. Our extensive experiments demonstrate that integrating LSRNA outperforms state-of-the-art reference-based methods across various resolutions and metrics, while showing the critical role of latent space upsampling in preserving detail and sharpness. The code is available at https://github.com/3587jjh/LSRNA.
Problem

Research questions and friction points this paper is trying to address.

Overcoming structural distortions in high-resolution diffusion models
Addressing manifold deviation in latent space upsampling
Enhancing high-frequency details without over-smoothing in RGB space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent space super-resolution for manifold alignment
Region-wise noise addition for high-frequency details
Combining LSR and RNA to enhance resolution
🔎 Similar Papers
No similar papers found.