🤖 AI Summary
Existing model fusion approaches struggle to efficiently integrate the vast and growing collection of online expert diffusion models and lack the flexibility to meet diverse text-to-image generation requirements. This work proposes an agent-driven graph-structured framework that, for the first time, organizes online expert models into a scalable graph topology. By introducing node registration and calibration mechanisms, the framework dynamically activates task-specific subgraphs in response to user demands, enabling on-demand, customized model fusion. Integrating graph neural networks, diffusion models, and intelligent agents, the proposed method demonstrates significant improvements in both fusion flexibility and generation quality across multiple real-world scenarios.
📝 Abstract
The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging methods remain limited in fully leveraging abundant online expert resources and still struggle to meet diverse in-the-wild user needs. We present DiffGraph, a novel agent-driven graph-based model merging framework, which automatically harnesses online experts and flexibly merges them for diverse user needs. Our DiffGraph constructs a scalable graph and organizes ever-expanding online experts within it through node registration and calibration. Then, DiffGraph dynamically activates specific subgraphs based on user needs, enabling flexible combinations of different experts to achieve user-desired generation. Extensive experiments show the efficacy of our method.