Few-Step Distillation for Text-to-Image Generation: A Practical Guide

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of adapting diffusion distillation to free-form text prompts in open-domain text-to-image (T2I) generation. We present the first systematic transfer of state-of-the-art diffusion distillation techniques to the powerful T2I teacher model FLUX.1-lite. To this end, we propose a unified distillation framework that identifies text-conditioning-induced optimization instability as the root cause and introduces four synergistic strategies: input scaling, dynamic noise scheduling, cross-modal feature alignment, and lightweight network architecture co-optimization. Our method achieves significant inference acceleration—within ≤8 sampling steps—while preserving high visual fidelity, consistently outperforming existing T2I distillation approaches across multi-scale quantitative and qualitative evaluations. To foster reproducibility and practical deployment, we publicly release our complete codebase and pre-trained lightweight student models, enabling efficient on-device T2I generation.

Technology Category

Application Category

📝 Abstract
Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite. By casting existing methods into a unified framework, we identify the key obstacles that arise when moving from discrete class labels to free-form language prompts. Beyond a thorough methodological analysis, we offer practical guidelines on input scaling, network architecture, and hyperparameters, accompanied by an open-source implementation and pretrained student models. Our findings establish a solid foundation for deploying fast, high-fidelity, and resource-efficient diffusion generators in real-world T2I applications. Code is available on github.com/alibaba-damo-academy/T2I-Distill.
Problem

Research questions and friction points this paper is trying to address.

Adapting distillation techniques for text-to-image generation
Identifying obstacles in moving from class labels to prompts
Providing practical guidelines for efficient diffusion generators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting distillation techniques for text-to-image generation
Providing practical guidelines for scaling and architecture
Establishing foundation for fast high-fidelity diffusion generators
🔎 Similar Papers
No similar papers found.
Y
Yifan Pu
Tsinghua University
Yizeng Han
Yizeng Han
Alibaba DAMO Academy
Dynamic Neural NetworksEfficient Deep LearningComputer Vision
Zhiwei Tang
Zhiwei Tang
DAMO Academy, Alibaba Group
J
Jiasheng Tang
Hupan Lab
F
Fan Wang
DAMO Academy, Alibaba Group
Bohan Zhuang
Bohan Zhuang
Zhejiang University
Efficient AIMLSys
G
Gao Huang
Tsinghua University