ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ComfyUI image generation workflows suffer from high entry barriers and steep learning curves due to complex node interconnections and intricate visual programming. Method: We propose FlowAgent, the first framework enabling end-to-end automatic synthesis of executable node graphs directly from natural language task descriptions. Departing from holistic graph generation, FlowAgent models node-level connectivity via a multi-agent collaborative architecture (Reformat/Flow/Refine/Execute), fine-tuned using supervised fine-tuning (SFT) and reinforcement learning. We introduce FlowDataset—a curated 13.6K-sample benchmark—and FlowBench, a comprehensive evaluation suite measuring Format Validity (FV), Prompt Alignment (PA), Node Intent Accuracy (PIA), and Node Diversity (PND). Results: FlowAgent significantly outperforms existing LLM-based baselines across all FlowBench metrics, establishing a novel paradigm for automating visual AI workflow construction.

Technology Category

Application Category

📝 Abstract
ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.
Problem

Research questions and friction points this paper is trying to address.

Automates ComfyUI workflow generation from task descriptions
Reduces learning curve for node-based image generation tasks
Improves workflow precision via specialized multi-agent system
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-optimizing multi-agent system for ComfyUI
Generates individual node links for precision
LLM-based agent with SFT and RL tuning
🔎 Similar Papers
No similar papers found.
O
Oucheng Huang
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University
Yuhang Ma
Yuhang Ma
Bytedance, University College London
Generative AIMulti-module Pretraining(Conditional) Text-to-image Generation (AIGC)
Z
Zeng Zhao
Fuxi AI Lab, Netease Inc.
Mingrui Wu
Mingrui Wu
XMU
MLLMT2I
Jiayi Ji
Jiayi Ji
Rutgers University
Rongsheng Zhang
Rongsheng Zhang
Fuxi AI Lab, NetEase Inc., Hangzhou, China
NLP
Z
Zhipeng Hu
Fuxi AI Lab, Netease Inc.
X
Xiaoshuai Sun
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University
R
Rongrong Ji
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University