LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based agents lack lifelong learning capability, hindering continuous knowledge accumulation and transfer in dynamic environments; no standardized benchmark exists for systematic evaluation. Method: We propose LifelongAgentBench—the first dedicated benchmark for evaluating lifelong learning in LLM agents—spanning database, operating system, and knowledge graph interaction environments. Contribution/Results: (1) We formally define and quantify lifelong learning capability for LLM agents; (2) we design skill-driven tasks and modular environment interfaces enabling automated annotation and scalable assessment; (3) we introduce intra-group self-consistent reasoning to mitigate context-length constraints and noise sensitivity inherent in conventional experience replay. Experiments demonstrate significant improvements in cross-task knowledge accumulation and transfer, validating the measurability, trainability, and scalability of lifelong learning. LifelongAgentBench establishes a new paradigm for developing adaptive, memory-augmented intelligent agents.

Technology Category

Application Category

📝 Abstract
Lifelong learning is essential for intelligent agents operating in dynamic environments. Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time. Existing benchmarks treat agents as static systems and fail to evaluate lifelong learning capabilities. We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents. It provides skill-grounded, interdependent tasks across three interactive environments, Database, Operating System, and Knowledge Graph, with automatic label verification, reproducibility, and modular extensibility. Extensive experiments reveal that conventional experience replay has limited effectiveness for LLM agents due to irrelevant information and context length constraints. We further introduce a group self-consistency mechanism that significantly improves lifelong learning performance. We hope LifelongAgentBench will advance the development of adaptive, memory-capable LLM agents.
Problem

Research questions and friction points this paper is trying to address.

Assessing lifelong learning in LLM agents
Overcoming statelessness in dynamic environments
Improving knowledge accumulation and transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark for lifelong learning assessment
Group self-consistency mechanism improves learning
Skill-grounded tasks with automatic verification
🔎 Similar Papers
No similar papers found.
Junhao Zheng
Junhao Zheng
South China University of Technology, Qwen Team
Large Language ModelsPretrainingContinual Learning
X
Xidi Cai
South China University of Technology
Q
Qiuke Li
South China University of Technology
Duzhen Zhang
Duzhen Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingMultimodalLarge Language ModelsContinual LearningAI4Science
Z
ZhongZhi Li
Chinese Academy of Sciences
Y
Yingying Zhang
East China Normal University
Le Song
Le Song
CTO, GenBio AI; Professor, MBZUAI
AIAI for ScienceMachine Learning
Q
Qianli Ma
South China University of Technology