🤖 AI Summary
Diffusion models face dual challenges—slow sampling and degraded generation quality—when adapting to new domains; existing two-stage approaches (domain adaptation followed by knowledge distillation) are architecturally complex and compromise either diversity or fidelity. This paper proposes Uni-DAD, the first framework unifying knowledge distillation and domain adaptation into a single-stage training process. Its core innovations include a dual-domain distribution-matching distillation objective, multi-head GAN losses, and a target-domain teacher guidance mechanism, jointly enabling faithful preservation of source-domain knowledge and high-fidelity, realistic generation in the target domain. Evaluated on few-shot image generation and subject-driven personalization tasks, Uni-DAD achieves state-of-the-art performance with only ≤4 sampling steps—significantly improving both generation quality and diversity over prior methods.
📝 Abstract
Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher's domain. Thus, fast and high-quality generation for novel domains relies on two-stage training pipelines: Adapt-then-Distill or Distill-then-Adapt. However, both add design complexity and suffer from degraded quality or diversity. We introduce Uni-DAD, a single-stage pipeline that unifies distillation and adaptation of DMs. It couples two signals during training: (i) a dual-domain distribution-matching distillation objective that guides the student toward the distributions of the source teacher and a target teacher, and (ii) a multi-head generative adversarial network (GAN) loss that encourages target realism across multiple feature scales. The source domain distillation preserves diverse source knowledge, while the multi-head GAN stabilizes training and reduces overfitting, especially in few-shot regimes. The inclusion of a target teacher facilitates adaptation to more structurally distant domains. We perform evaluations on a variety of datasets for few-shot image generation (FSIG) and subject-driven personalization (SDP). Uni-DAD delivers higher quality than state-of-the-art (SoTA) adaptation methods even with less than 4 sampling steps, and outperforms two-stage training pipelines in both quality and diversity.