Cross-Resolution Distribution Matching for Diffusion Distillation

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing diffusion distillation methods suffer from a significant degradation in cross-resolution generation quality when reducing inference steps, primarily due to the mismatch between low- and high-resolution data distributions. To address this issue, this work proposes a Resolution-aware Matching Distillation (RMD) framework that systematically mitigates cross-resolution distribution discrepancies. RMD partitions timesteps according to the logSNR schedule, introduces a logSNR mapping to compensate for resolution-induced shifts, and aligns distributions along the resolution trajectory. Coupled with a predicted noise re-injection mechanism, RMD substantially enhances training stability and generation fidelity. The method achieves up to 33.4× and 25.6× inference speedups on SDXL and Wan2.1-14B, respectively, while preserving high visual quality.

Technology Category

Application Category

📝 Abstract

Diffusion distillation is central to accelerating image and video generation, yet existing methods are fundamentally limited by the denoising process, where step reduction has largely saturated. Partial timestep low-resolution generation can further accelerate inference, but it suffers noticeable quality degradation due to cross-resolution distribution gaps. We propose Cross-Resolution Distribution Matching Distillation (RMD), a novel distillation framework that bridges cross-resolution distribution gaps for high-fidelity, few-step multi-resolution cascaded inference. Specifically, RMD divides the timestep intervals for each resolution using logarithmic signal-to-noise ratio (logSNR) curves, and introduces logSNR-based mapping to compensate for resolution-induced shifts. Distribution matching is conducted along resolution trajectories to reduce the gap between low-resolution generator distributions and the teacher's high-resolution distribution. In addition, a predicted-noise re-injection mechanism is incorporated during upsampling to stabilize training and improve synthesis quality. Quantitative and qualitative results show that RMD preserves high-fidelity generation while accelerating inference across various backbones. Notably, RMD achieves up to 33.4X speedup on SDXL and 25.6X on Wan2.1-14B, while preserving high visual fidelity.

Problem

Research questions and friction points this paper is trying to address.

diffusion distillation

cross-resolution distribution gap

low-resolution generation

inference acceleration

distribution mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Resolution Distribution Matching

Diffusion Distillation

logSNR-based Mapping