Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based agent frameworks rely on manually predefined tools and workflows, severely limiting cross-domain adaptability, scalability, and generalization. Method: We propose Alita, a lightweight, universal agent architecture grounded in a “minimal predefinition + maximal self-evolution” paradigm. Alita employs only a single core component—a large language model–driven meta-reasoning framework—that autonomously generates, refines, and reuses open-source Model Context Protocols (MCPs) as shareable, executable capability carriers, enabling zero-hard-coded capability expansion. It eliminates conventional toolchains and supports end-to-end, cross-domain autonomous evolution. Contribution/Results: Evaluated on GAIA, MathVista, and PathVQA, Alita achieves pass@1 scores of 75.15%, 74.00%, and 52.00%, respectively—surpassing state-of-the-art complex agent systems and establishing a new GAIA SOTA.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of"Simplicity is the ultimate sophistication,"enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at $href{https://github.com/CharlesQ9/Alita}{https://github.com/CharlesQ9/Alita}$.
Problem

Research questions and friction points this paper is trying to address.

Reducing dependency on predefined tools for adaptability
Enhancing scalability through minimal predefinition in agents
Promoting self-evolution for autonomous capability construction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimal predefinition with one problem-solving component
Maximal self-evolution via general-purpose components
Autonomous construction of task-related MCPs
🔎 Similar Papers
No similar papers found.
Jiahao Qiu
Jiahao Qiu
Princeton University
LLMAI AgentsAI for X
Xuan Qi
Xuan Qi
Undergraduate, Tsinghua university
Natural language processingMulti-modal language model
T
Tongcheng Zhang
Shanghai Jiao Tong University
Xinzhe Juan
Xinzhe Juan
University of Michigan
AI AgentAI4Science
J
Jiacheng Guo
AI Lab, Princeton University
Yifu Lu
Yifu Lu
Undergraduate, University of Michigan
Computer Science
Y
Yimin Wang
Shanghai Jiao Tong University, University of Michigan
Z
Zixin Yao
AI Lab, Princeton University
Qihan Ren
Qihan Ren
Shanghai Jiao Tong University
Explainable AIMachine LearningComputer VisionNatural Language Processing
X
Xun Jiang
Tianqiao and Chrissy Chen Institute
Xing Zhou
Xing Zhou
Computer Science, University of Illinois at Urbana-Champaign
Compiler Optimizations
D
Dongrui Liu
Shanghai Jiao Tong University
Ling Yang
Ling Yang
Postdoc@Princeton University, PhD@Peking University
LLMDiffusion ModelsReinforcement LearningComplex Data Modeling
Y
Yue Wu
AI Lab, Princeton University
K
Kaixuan Huang
AI Lab, Princeton University
Shilong Liu
Shilong Liu
RS@ByteDance, PhD@THU
Computer VisionObject DetectionVisual GroundingMulti-ModalityMultimodal Agent
H
Hongru Wang
The Chinese University of Hong Kong
M
Mengdi Wang
AI Lab, Princeton University