DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model fusion approaches struggle to efficiently integrate the vast and growing collection of online expert diffusion models and lack the flexibility to meet diverse text-to-image generation requirements. This work proposes an agent-driven graph-structured framework that, for the first time, organizes online expert models into a scalable graph topology. By introducing node registration and calibration mechanisms, the framework dynamically activates task-specific subgraphs in response to user demands, enabling on-demand, customized model fusion. Integrating graph neural networks, diffusion models, and intelligent agents, the proposed method demonstrates significant improvements in both fusion flexibility and generation quality across multiple real-world scenarios.

Technology Category

Application Category

📝 Abstract
The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants of pretrained diffusion models specialized for diverse generative abilities. Yet, existing model merging methods remain limited in fully leveraging abundant online expert resources and still struggle to meet diverse in-the-wild user needs. We present DiffGraph, a novel agent-driven graph-based model merging framework, which automatically harnesses online experts and flexibly merges them for diverse user needs. Our DiffGraph constructs a scalable graph and organizes ever-expanding online experts within it through node registration and calibration. Then, DiffGraph dynamically activates specific subgraphs based on user needs, enabling flexible combinations of different experts to achieve user-desired generation. Extensive experiments show the efficacy of our method.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
model merging
expert models
in-the-wild user needs
diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

model merging
agent-driven
graph-based framework
text-to-image generation
expert models
🔎 Similar Papers
No similar papers found.