Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing general-purpose autonomous driving agents lack end-to-end closed-loop autonomy. Method: This paper introduces the first open-source, model-centric framework that employs fine-tuned large language models (LLMs) as its core policy engine. It achieves full-stack closed-loop operation—comprising intent understanding, proactive perception, autonomous decision-making, and dynamic execution—via atomic-level environment interaction interfaces (e.g., file operations, UI clicks, memory access, and self-calls), eliminating reliance on manual state inputs or environment-centric paradigms. The system is containerized using Docker, enabling seamless cross-source information fusion and highly flexible behavior generation. Results: It achieves state-of-the-art or competitive performance on three critical tasks: real-time information management, private data processing, and long-term memory retention—matching or surpassing leading proprietary systems. Both source code and the base model are fully open-sourced.

Technology Category

Application Category

📝 Abstract
We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish independently, which requires the system to acquire the state information from the environments actively. To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions. Cognitive Kernel adopts a model-centric design. In our implementation, the central policy model (a fine-tuned LLM) initiates interactions with the environment using a combination of atomic actions, such as opening files, clicking buttons, saving intermediate results to memory, or calling the LLM itself. This differs from the widely used environment-centric design, where a task-specific environment with predefined actions is fixed, and the policy model is limited to selecting the correct action from a given set of options. Our design facilitates seamless information flow across various sources and provides greater flexibility. We evaluate our system in three use cases: real-time information management, private information management, and long-term memory management. The results demonstrate that Cognitive Kernel achieves better or comparable performance to other closed-source systems in these scenarios. Cognitive Kernel is fully dockerized, ensuring everyone can deploy it privately and securely. We open-source the system and the backbone model to encourage further research on LLM-driven autopilot systems.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Systems
Complex Task Execution
Artificial Intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cognitive Kernel
Autonomous Driving
Docker Technology
🔎 Similar Papers
No similar papers found.
H
Hongming Zhang
Cognitive Kernel Team, Tencent AI Lab, Seattle
Xiaoman Pan
Xiaoman Pan
Amazon
Large Language ModelsMachine LearningNatural Language Processing
H
Hongwei Wang
Cognitive Kernel Team, Tencent AI Lab, Seattle
Kaixin Ma
Kaixin Ma
Researcher, Apple
LLMsMultimodal Foundation ModelsAgents
W
Wenhao Yu
Cognitive Kernel Team, Tencent AI Lab, Seattle
D
Dong Yu
Cognitive Kernel Team, Tencent AI Lab, Seattle