Cross-Resolution Diffusion Models via Network Pruning

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the semantic inaccuracies and structural instabilities that arise in UNet-based diffusion models during cross-resolution generation, primarily due to the coupling between model parameters and input resolution. To mitigate this issue, the authors propose CR-Diff, a novel approach that, for the first time, employs block-level network pruning to eliminate detrimental weights and subsequently upscales the pruned outputs to refine predictions. This strategy effectively alleviates parameter conflicts across resolutions without requiring retraining. CR-Diff is readily adaptable to various diffusion backbones and generalizes to unseen resolutions, preserving performance at the native resolution while significantly enhancing perceptual fidelity and semantic coherence in cross-resolution synthesis. Moreover, it enables on-demand quality enhancement, offering flexible control over output quality.

Technology Category

Application Category

📝 Abstract

Diffusion models have demonstrated impressive image synthesis performance, yet many UNet-based models are trained at certain fixed resolutions. Their quality tends to degrade when generating images at out-of-training resolutions. We trace this issue to resolution-dependent parameter behaviors, where weights that function well at the default resolution can become adverse when spatial scales shift, weakening semantic alignment and causing structural instability in the UNet architecture. Based on this analysis, this paper introduces CR-Diff, a novel method that improves the cross-resolution visual consistency by pruning some parameters of the diffusion model. Specifically, CR-Diff has two stages. It first performs block-wise pruning to selectively eliminate adverse weights. Then, a pruned output amplification is conducted to further purify the pruned predictions. Empirically, extensive experiments suggest that CR-Diff can improve perceptual fidelity and semantic coherence across various diffusion backbones and unseen resolutions, while largely preserving the performance at default resolutions. Additionally, CR-Diff supports prompt-specific refinement, enabling quality enhancement on demand.

Problem

Research questions and friction points this paper is trying to address.

cross-resolution

diffusion models

resolution generalization

UNet architecture

image synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-resolution generation

network pruning

diffusion models