Learning Graph Representation of Agent Diffuser

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing diffusion models rely on static parameters, limiting their adaptability to the dynamic requirements across different stages of image generation. To address this, we propose DiffAgent—a dynamic diffusion generation framework based on multi-agent collaboration. Our method models the generation process as a graph-structured interaction among specialized agents, explicitly capturing agent relationships via a graph neural network (GNN), and introduces a top-k maximum spanning tree coordination mechanism for efficient inter-agent collaboration. Furthermore, we design a meta-model-driven composite loss function to balance generation fidelity and diversity. We provide theoretical guarantees of convergence and stability for the framework. Extensive experiments demonstrate that DiffAgent significantly outperforms state-of-the-art diffusion models across multiple benchmarks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Diffusion-based generative models have significantly advanced text-to-image synthesis, demonstrating impressive text comprehension and zero-shot generalization. These models refine images from random noise based on textual prompts, with initial reliance on text input shifting towards enhanced visual fidelity over time. This transition suggests that static model parameters might not optimally address the distinct phases of generation. We introduce LGR-AD (Learning Graph Representation of Agent Diffusers), a novel multi-agent system designed to improve adaptability in dynamic computer vision tasks. LGR-AD models the generation process as a distributed system of interacting agents, each representing an expert sub-model. These agents dynamically adapt to varying conditions and collaborate through a graph neural network that encodes their relationships and performance metrics. Our approach employs a coordination mechanism based on top-$k$ maximum spanning trees, optimizing the generation process. Each agent's decision-making is guided by a meta-model that minimizes a novel loss function, balancing accuracy and diversity. Theoretical analysis and extensive empirical evaluations show that LGR-AD outperforms traditional diffusion models across various benchmarks, highlighting its potential for scalable and flexible solutions in complex image generation tasks. Code is available at: https://github.com/YousIA/LGR_AD

Problem

Research questions and friction points this paper is trying to address.

Enhancing adaptability in dynamic computer vision tasks

Optimizing generation phases with multi-agent collaboration

Balancing accuracy and diversity in image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system with dynamic adaptability

Graph neural network for agent collaboration

Top-k spanning trees for coordination optimization

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents