ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-source generative model frameworks exhibit fragility in complex multimodal tasks due to the absence of structured workflow planning and execution-level feedback mechanisms. This paper introduces a collaborative AI system for general-purpose generation. Methodologically, it integrates three core innovations: (1) a Semantic Workflow Interface (SWI), enabling natural-language-driven, high-level modular orchestration; (2) a search-tree-based planning mechanism with local feedback, supporting hierarchical decision-making and adaptive correction during generation; and (3) ComfyUI platform integration featuring semantic node-graph encapsulation and real-time execution feedback. Evaluated on three comprehensive benchmarks—ComfyBench, GenEval, and Reason-Edit—the system consistently outperforms state-of-the-art open-source baselines. Its generation and editing capabilities match those of GPT-Image-1, demonstrating unprecedented robustness and controllability in multimodal generative workflows.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind
Problem

Research questions and friction points this paper is trying to address.

Enables robust general-purpose generation via structured workflow planning
Reduces errors with natural language functional modules and feedback
Improves stability in complex generative tasks across benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Workflow Interface for natural language modules
Search Tree Planning with localized feedback
Hierarchical decision process for adaptive correction
🔎 Similar Papers