🤖 AI Summary
Existing open-source generative model frameworks exhibit fragility in complex multimodal tasks due to the absence of structured workflow planning and execution-level feedback mechanisms. This paper introduces a collaborative AI system for general-purpose generation. Methodologically, it integrates three core innovations: (1) a Semantic Workflow Interface (SWI), enabling natural-language-driven, high-level modular orchestration; (2) a search-tree-based planning mechanism with local feedback, supporting hierarchical decision-making and adaptive correction during generation; and (3) ComfyUI platform integration featuring semantic node-graph encapsulation and real-time execution feedback. Evaluated on three comprehensive benchmarks—ComfyBench, GenEval, and Reason-Edit—the system consistently outperforms state-of-the-art open-source baselines. Its generation and editing capabilities match those of GPT-Image-1, demonstrating unprecedented robustness and controllability in multimodal generative workflows.
📝 Abstract
With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind