🤖 AI Summary
This work addresses the challenge of domain adaptation for specialized multi-task generative modeling (e.g., text-to-image, image editing, 3D generation), where labeled data is scarce and in-domain fine-tuning risks degrading the pre-trained model’s open-world generalization. We propose a label-free, decoupled domain adaptation framework leveraging only unlabeled images. Our method comprises: (1) a decoupled prior preservation mechanism that jointly maintains unconditional generation capability and cross-task controllability; (2) text-agnostic domain knowledge distillation via a lightweight UNet guidance module; and (3) a multi-source guided diffusion framework built upon an improved Classifier-Free Guidance scheme, with theoretically grounded fusion modeling. Experiments across multiple professional domains demonstrate substantial improvements in generation quality and task generalization—without any text annotations—matching or surpassing supervised fine-tuning, while remaining fully compatible with mainstream control paradigms.
📝 Abstract
In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more. Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data. Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation. Fine-tuning a pre-trained generative model on domain data is an intuitive but challenging way and often requires complex manual hyper-parameter adjustments since the limited diversity of the training data can easily disrupt the model's original generative capabilities. To address this challenge, we propose a guidance-decoupled prior preservation mechanism to achieve high generative quality and controllability by image-only data, inspired by preserving the pre-trained model from a denoising guidance perspective. We decouple domain-related guidance from the conditional guidance used in classifier-free guidance mechanisms to preserve open-world control guidance and unconditional guidance from the pre-trained model. We further propose an efficient domain knowledge learning technique to train an additional text-free UNet copy to predict domain guidance. Besides, we theoretically illustrate a multi-guidance in-domain generation pipeline for a variety of generative tasks, leveraging multiple guidances from distinct diffusion models and conditions. Extensive experiments demonstrate the superiority of our method in domain-specific synthesis and its compatibility with various diffusion-based control methods and applications.