Few-Step Distillation for Text-to-Image Generation: A Practical Guide

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the critical challenge of adapting diffusion distillation to free-form text prompts in open-domain text-to-image (T2I) generation. We present the first systematic transfer of state-of-the-art diffusion distillation techniques to the powerful T2I teacher model FLUX.1-lite. To this end, we propose a unified distillation framework that identifies text-conditioning-induced optimization instability as the root cause and introduces four synergistic strategies: input scaling, dynamic noise scheduling, cross-modal feature alignment, and lightweight network architecture co-optimization. Our method achieves significant inference acceleration—within ≤8 sampling steps—while preserving high visual fidelity, consistently outperforming existing T2I distillation approaches across multi-scale quantitative and qualitative evaluations. To foster reproducibility and practical deployment, we publicly release our complete codebase and pre-trained lightweight student models, enabling efficient on-device T2I generation.

Technology Category

Application Category

📝 Abstract

Diffusion distillation has dramatically accelerated class-conditional image synthesis, but its applicability to open-ended text-to-image (T2I) generation is still unclear. We present the first systematic study that adapts and compares state-of-the-art distillation techniques on a strong T2I teacher model, FLUX.1-lite. By casting existing methods into a unified framework, we identify the key obstacles that arise when moving from discrete class labels to free-form language prompts. Beyond a thorough methodological analysis, we offer practical guidelines on input scaling, network architecture, and hyperparameters, accompanied by an open-source implementation and pretrained student models. Our findings establish a solid foundation for deploying fast, high-fidelity, and resource-efficient diffusion generators in real-world T2I applications. Code is available on github.com/alibaba-damo-academy/T2I-Distill.

Problem

Research questions and friction points this paper is trying to address.

Adapting distillation techniques for text-to-image generation

Identifying obstacles in moving from class labels to prompts

Providing practical guidelines for efficient diffusion generators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting distillation techniques for text-to-image generation

Providing practical guidelines for scaling and architecture

Establishing foundation for fast high-fidelity diffusion generators

🔎 Similar Papers

No similar papers found.