🤖 AI Summary
This study addresses the lack of a unified cognitive theoretical foundation for general-purpose autonomous agents. We propose UMM, the first full-stack cognitive architecture integrating Global Workspace Theory (GWT) with large language models (LLMs). UMM holistically models human-level cognitive capabilities—including multimodal perception, hierarchical planning, neuro-symbolic reasoning, external memory, tool invocation, and reflective learning—overcoming the fragmentation inherent in conventional agent architectures. To enable rapid deployment, we introduce MindOS, a zero-code construction engine supporting domain-specific agent development within minutes. Empirical evaluations demonstrate that UMM significantly outperforms existing frameworks in complex task planning, cross-tool coordination, and continual learning, while maintaining strong scalability and interpretability.
📝 Abstract
Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages (e.g., ChatGPT and GPT-4), reviving the research of general autonomous agents with human-like cognitive abilities.Such human-level agents require semantic comprehension and instruction-following capabilities, which exactly fall into the strengths of LLMs.Although there have been several initial attempts to build human-level agents based on LLMs, the theoretical foundation remains a challenging open problem. In this paper, we propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents with human-level cognitive abilities. Specifically, our UMM starts with the global workspace theory and further leverage LLMs to enable the agent with various cognitive abilities, such as multi-modal perception, planning, reasoning, tool use, learning, memory, reflection and motivation. Building upon UMM, we then develop an agent-building engine, MindOS, which allows users to quickly create domain-/task-specific autonomous agents without any programming effort.