🤖 AI Summary
ComfyUI image generation workflows suffer from high entry barriers and steep learning curves due to complex node interconnections and intricate visual programming.
Method: We propose FlowAgent, the first framework enabling end-to-end automatic synthesis of executable node graphs directly from natural language task descriptions. Departing from holistic graph generation, FlowAgent models node-level connectivity via a multi-agent collaborative architecture (Reformat/Flow/Refine/Execute), fine-tuned using supervised fine-tuning (SFT) and reinforcement learning. We introduce FlowDataset—a curated 13.6K-sample benchmark—and FlowBench, a comprehensive evaluation suite measuring Format Validity (FV), Prompt Alignment (PA), Node Intent Accuracy (PIA), and Node Diversity (PND).
Results: FlowAgent significantly outperforms existing LLM-based baselines across all FlowBench metrics, establishing a novel paradigm for automating visual AI workflow construction.
📝 Abstract
ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.