From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Diffusion models suffer from high inference latency, severely limiting their real-time applicability. Existing distillation approaches face a fundamental trade-off: trajectory distillation preserves structural fidelity but loses fine-grained details, whereas distribution distillation achieves high perceptual fidelity yet suffers from mode collapse and training instability. To address this, we propose a hierarchical distillation framework: first generating a structural skeleton via trajectory distillation, then refining high-frequency details through collaborative distribution distillation. We further introduce an Adaptive Weighted Discriminator (AWD) that dynamically emphasizes local artifacts during adversarial training, effectively mitigating mode collapse and enhancing training stability. Our method enables high-fidelity single-step image generation—achieving an FID of 2.26 on ImageNet 256×256, matching the quality of a 250-step teacher model, and significantly outperforming state-of-the-art single-step methods on the MJHQ text-to-image benchmark.

Technology Category

Application Category

📝 Abstract

The inference latency of diffusion models remains a critical barrier to their real-time application. While trajectory-based and distribution-based step distillation methods offer solutions, they present a fundamental trade-off. Trajectory-based methods preserve global structure but act as a"lossy compressor", sacrificing high-frequency details. Conversely, distribution-based methods can achieve higher fidelity but often suffer from mode collapse and unstable training. This paper recasts them from independent paradigms into synergistic components within our novel Hierarchical Distillation (HD) framework. We leverage trajectory distillation not as a final generator, but to establish a structural ``sketch", providing a near-optimal initialization for the subsequent distribution-based refinement stage. This strategy yields an ideal initial distribution that enhances the ceiling of overall performance. To further improve quality, we introduce and refine the adversarial training process. We find standard discriminator structures are ineffective at refining an already high-quality generator. To overcome this, we introduce the Adaptive Weighted Discriminator (AWD), tailored for the HD pipeline. By dynamically allocating token weights, AWD focuses on local imperfections, enabling efficient detail refinement. Our approach demonstrates state-of-the-art performance across diverse tasks. On ImageNet $256 imes256$, our single-step model achieves an FID of 2.26, rivaling its 250-step teacher. It also achieves promising results on the high-resolution text-to-image MJHQ benchmark, proving its generalizability. Our method establishes a robust new paradigm for high-fidelity, single-step diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Reducing diffusion model inference latency for real-time applications

Balancing global structure preservation with high-frequency detail retention

Overcoming mode collapse and training instability in step distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical distillation combines trajectory and distribution methods

Adaptive Weighted Discriminator focuses on local imperfections dynamically

Framework establishes structural sketch then refines details efficiently

🔎 Similar Papers

No similar papers found.