Learning Graph Representation of Agent Diffuser

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models rely on static parameters, limiting their adaptability to the dynamic requirements across different stages of image generation. To address this, we propose DiffAgent—a dynamic diffusion generation framework based on multi-agent collaboration. Our method models the generation process as a graph-structured interaction among specialized agents, explicitly capturing agent relationships via a graph neural network (GNN), and introduces a top-k maximum spanning tree coordination mechanism for efficient inter-agent collaboration. Furthermore, we design a meta-model-driven composite loss function to balance generation fidelity and diversity. We provide theoretical guarantees of convergence and stability for the framework. Extensive experiments demonstrate that DiffAgent significantly outperforms state-of-the-art diffusion models across multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Diffusion-based generative models have significantly advanced text-to-image synthesis, demonstrating impressive text comprehension and zero-shot generalization. These models refine images from random noise based on textual prompts, with initial reliance on text input shifting towards enhanced visual fidelity over time. This transition suggests that static model parameters might not optimally address the distinct phases of generation. We introduce LGR-AD (Learning Graph Representation of Agent Diffusers), a novel multi-agent system designed to improve adaptability in dynamic computer vision tasks. LGR-AD models the generation process as a distributed system of interacting agents, each representing an expert sub-model. These agents dynamically adapt to varying conditions and collaborate through a graph neural network that encodes their relationships and performance metrics. Our approach employs a coordination mechanism based on top-$k$ maximum spanning trees, optimizing the generation process. Each agent's decision-making is guided by a meta-model that minimizes a novel loss function, balancing accuracy and diversity. Theoretical analysis and extensive empirical evaluations show that LGR-AD outperforms traditional diffusion models across various benchmarks, highlighting its potential for scalable and flexible solutions in complex image generation tasks. Code is available at: https://github.com/YousIA/LGR_AD
Problem

Research questions and friction points this paper is trying to address.

Enhancing adaptability in dynamic computer vision tasks
Optimizing generation phases with multi-agent collaboration
Balancing accuracy and diversity in image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with dynamic adaptability
Graph neural network for agent collaboration
Top-k spanning trees for coordination optimization
🔎 Similar Papers
No similar papers found.