ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

๐Ÿ“… 2025-06-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Users face high barriers to constructing customized workflows in visual AI painting platforms (e.g., ComfyUI), hindering low-threshold AI content creation. Method: We propose the first large-scale reasoning model (7B) for automated workflow generation, featuring: (1) a novel long-chain reasoning paradigm tailored to visual workflows; (2) a two-stage training frameworkโ€”chain-of-thought (CoT) fine-tuning followed by reinforcement learning with hybrid rule- and metric-based rewards; and (3) explicit encoding of workflows as executable code. Contribution/Results: Our model achieves 97% syntactic validity and significantly outperforms closed-source models (e.g., GPT-4o, Claude) in node-level and graph-level F1 scores. It demonstrates state-of-the-art performance on complex, multi-node workflow synthesis tasks. This work pioneers the systematic integration of structured reasoning with visual workflow generation, establishing a new paradigm for accessible, workflow-driven AI content creation.

Technology Category

Application Category

๐Ÿ“ Abstract
AI-generated content has evolved from monolithic models to modular workflows, particularly on platforms like ComfyUI, enabling customization in creative pipelines. However, crafting effective workflows requires great expertise to orchestrate numerous specialized components, presenting a steep learning curve for users. To address this challenge, we introduce ComfyUI-R1, the first large reasoning model for automated workflow generation. Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection, workflow planning, and code-level workflow representation. ComfyUI-R1 is trained through a two-stage framework: (1) CoT fine-tuning for cold start, adapting models to the ComfyUI domain; (2) reinforcement learning for incentivizing reasoning capability, guided by a fine-grained rule-metric hybrid reward, ensuring format validity, structural integrity, and node-level fidelity. Experiments show that our 7B-parameter model achieves a 97% format validity rate, along with high pass rate, node-level and graph-level F1 scores, significantly surpassing prior state-of-the-art methods that employ leading closed-source models such as GPT-4o and Claude series. Further analysis highlights the critical role of the reasoning process and the advantage of transforming workflows into code. Qualitative comparison reveals our strength in synthesizing intricate workflows with diverse nodes, underscoring the potential of long CoT reasoning in AI art creation.
Problem

Research questions and friction points this paper is trying to address.

Automating workflow generation for AI content creation
Reducing expertise needed for modular workflow orchestration
Improving reasoning models for complex workflow planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

First large reasoning model for workflow automation
Two-stage training with CoT and reinforcement learning
Transforms workflows into code for better fidelity
๐Ÿ”Ž Similar Papers
No similar papers found.