ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation

📅 2025-03-22

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

ComfyUI image generation workflows suffer from high entry barriers and steep learning curves due to complex node interconnections and intricate visual programming. Method: We propose FlowAgent, the first framework enabling end-to-end automatic synthesis of executable node graphs directly from natural language task descriptions. Departing from holistic graph generation, FlowAgent models node-level connectivity via a multi-agent collaborative architecture (Reformat/Flow/Refine/Execute), fine-tuned using supervised fine-tuning (SFT) and reinforcement learning. We introduce FlowDataset—a curated 13.6K-sample benchmark—and FlowBench, a comprehensive evaluation suite measuring Format Validity (FV), Prompt Alignment (PA), Node Intent Accuracy (PIA), and Node Diversity (PND). Results: FlowAgent significantly outperforms existing LLM-based baselines across all FlowBench metrics, establishing a novel paradigm for automating visual AI workflow construction.

Technology Category

Application Category

📝 Abstract

ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.

Problem

Research questions and friction points this paper is trying to address.

Automates ComfyUI workflow generation from task descriptions

Reduces learning curve for node-based image generation tasks

Improves workflow precision via specialized multi-agent system

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-optimizing multi-agent system for ComfyUI

Generates individual node links for precision

LLM-based agent with SFT and RL tuning

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation