DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Fine-tuning multi-style text-to-image (T2I) models incurs substantial parameter redundancy and high storage overhead. Method: This paper proposes Distillation-based Multi-Model fusion (DMM), a novel paradigm that introduces style-prompted generation and leverages Score Distillation Sampling (SDS) for knowledge distillation from multiple teacher models, followed by nonlinear parameter-space fusion—overcoming the limitations of conventional linear interpolation. Contribution/Results: DMM compresses over 30 specialized style models into a single lightweight, general-purpose model enabling fine-grained style control. Without increasing model size, it significantly outperforms baselines—including task arithmetic and linear interpolation—in generation quality, reduces storage cost by over 75%, and redefines both the objective function and evaluation protocol for T2I model fusion.

Technology Category

Application Category

📝 Abstract

The success of text-to-image (T2I) generation models has spurred a proliferation of numerous model checkpoints fine-tuned from the same base model on various specialized datasets. This overwhelming specialized model production introduces new challenges for high parameter redundancy and huge storage cost, thereby necessitating the development of effective methods to consolidate and unify the capabilities of diverse powerful models into a single one. A common practice in model merging adopts static linear interpolation in the parameter space to achieve the goal of style mixing. However, it neglects the features of T2I generation task that numerous distinct models cover sundry styles which may lead to incompatibility and confusion in the merged model. To address this issue, we introduce a style-promptable image generation pipeline which can accurately generate arbitrary-style images under the control of style vectors. Based on this design, we propose the score distillation based model merging paradigm (DMM), compressing multiple models into a single versatile T2I model. Moreover, we rethink and reformulate the model merging task in the context of T2I generation, by presenting new merging goals and evaluation protocols. Our experiments demonstrate that DMM can compactly reorganize the knowledge from multiple teacher models and achieve controllable arbitrary-style generation.

Problem

Research questions and friction points this paper is trying to address.

Reducing parameter redundancy and storage costs in specialized T2I models

Addressing style incompatibility in merged text-to-image generation models

Enabling controllable arbitrary-style generation via a single versatile model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Style-promptable pipeline for versatile image generation

Score distillation based model merging paradigm

Reformulated merging goals and evaluation protocols

🔎 Similar Papers

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation