Cross-Resolution Diffusion Models via Network Pruning

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the semantic inaccuracies and structural instabilities that arise in UNet-based diffusion models during cross-resolution generation, primarily due to the coupling between model parameters and input resolution. To mitigate this issue, the authors propose CR-Diff, a novel approach that, for the first time, employs block-level network pruning to eliminate detrimental weights and subsequently upscales the pruned outputs to refine predictions. This strategy effectively alleviates parameter conflicts across resolutions without requiring retraining. CR-Diff is readily adaptable to various diffusion backbones and generalizes to unseen resolutions, preserving performance at the native resolution while significantly enhancing perceptual fidelity and semantic coherence in cross-resolution synthesis. Moreover, it enables on-demand quality enhancement, offering flexible control over output quality.
📝 Abstract
Diffusion models have demonstrated impressive image synthesis performance, yet many UNet-based models are trained at certain fixed resolutions. Their quality tends to degrade when generating images at out-of-training resolutions. We trace this issue to resolution-dependent parameter behaviors, where weights that function well at the default resolution can become adverse when spatial scales shift, weakening semantic alignment and causing structural instability in the UNet architecture. Based on this analysis, this paper introduces CR-Diff, a novel method that improves the cross-resolution visual consistency by pruning some parameters of the diffusion model. Specifically, CR-Diff has two stages. It first performs block-wise pruning to selectively eliminate adverse weights. Then, a pruned output amplification is conducted to further purify the pruned predictions. Empirically, extensive experiments suggest that CR-Diff can improve perceptual fidelity and semantic coherence across various diffusion backbones and unseen resolutions, while largely preserving the performance at default resolutions. Additionally, CR-Diff supports prompt-specific refinement, enabling quality enhancement on demand.
Problem

Research questions and friction points this paper is trying to address.

cross-resolution
diffusion models
resolution generalization
UNet architecture
image synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-resolution generation
network pruning
diffusion models
semantic coherence
resolution adaptation
🔎 Similar Papers
No similar papers found.
J
Jiaxuan Ren
University of Electronic Science and Technology of China
J
Junhan Zhu
Westlake University
Huan Wang
Huan Wang
Westlake University
Efficient AIComputer VisionMachine Learning