LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Diffusion models suffer from slow sampling speed, hindering practical deployment; existing acceleration methods primarily focus on model compression or reducing denoising steps, overlooking the potential of multi-resolution协同 optimization. This paper proposes LowDiff—the first framework to systematically integrate cascaded multi-resolution generation into diffusion sampling. LowDiff employs a low-resolution input as guidance and progressively upscales and refines the output via a unified model, operating effectively in both pixel and latent spaces. Its core innovation lies in tightly coupling multi-resolution conditioning with progressive denoising, substantially reducing computational overhead during high-resolution synthesis. Evaluated on CIFAR-10, FFHQ, and ImageNet, LowDiff achieves over 50% throughput improvement while attaining competitive generation quality: FID of 2.11 and IS of 195.06—matching or surpassing baseline performance.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved remarkable success in image generation but their practical application is often hindered by the slow sampling speed. Prior efforts of improving efficiency primarily focus on compressing models or reducing the total number of denoising steps, largely neglecting the possibility to leverage multiple input resolutions in the generation process. In this work, we propose LowDiff, a novel and efficient diffusion framework based on a cascaded approach by generating increasingly higher resolution outputs. Besides, LowDiff employs a unified model to progressively refine images from low resolution to the desired resolution. With the proposed architecture design and generation techniques, we achieve comparable or even superior performance with much fewer high-resolution sampling steps. LowDiff is applicable to diffusion models in both pixel space and latent space. Extensive experiments on both conditional and unconditional generation tasks across CIFAR-10, FFHQ and ImageNet demonstrate the effectiveness and generality of our method. Results show over 50% throughput improvement across all datasets and settings while maintaining comparable or better quality. On unconditional CIFAR-10, LowDiff achieves an FID of 2.11 and IS of 9.87, while on conditional CIFAR-10, an FID of 1.94 and IS of 10.03. On FFHQ 64x64, LowDiff achieves an FID of 2.43, and on ImageNet 256x256, LowDiff built on LightningDiT-B/1 produces high-quality samples with a FID of 4.00 and an IS of 195.06, together with substantial efficiency gains.

Problem

Research questions and friction points this paper is trying to address.

Accelerating diffusion model sampling speed efficiently

Reducing high-resolution sampling steps while maintaining quality

Leveraging multi-resolution generation with unified cascaded model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascaded diffusion for multi-resolution generation

Unified model for progressive image refinement

Fewer high-resolution steps with comparable performance

🔎 Similar Papers

Diffusion Models: A Comprehensive Survey of Methods and Applications