Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
Training general-purpose agents faces significant challenges due to the scarcity of real-world environments and inadequate mechanisms for continual learning. This work proposes a self-evolving training platform that enables co-evolution between agents and environments through an autonomous environment-task discovery mechanism and a multi-environment reinforcement learning framework. The platform supports dynamic task synthesis driven by capability gaps and integrates a Model Context Protocol toolchain alongside a self-evolving agent arena. Experimental results demonstrate that the trained Agent-World-8B and 14B models substantially outperform strong baselines across 23 benchmarks, confirming the critical role of environmental diversity and iterative self-evolution in enhancing agent intelligence.

Technology Category

Application Category

📝 Abstract
Large language models are increasingly expected to serve as general-purpose agents that interact with external, stateful tool environments. The Model Context Protocol (MCP) and broader agent skills offer a unified interface for connecting agents with scalable real-world services, but training robust agents remains limited by the lack of realistic environments and principled mechanisms for life-long learning. In this paper, we present \textbf{Agent-World}, a self-evolving training arena for advancing general agent intelligence through scalable environments. Agent-World has two main components: (1) Agentic Environment-Task Discovery, which autonomously explores topic-aligned databases and executable tool ecosystems from thousands of real-world environment themes and synthesizes verifiable tasks with controllable difficulty; and (2) Continuous Self-Evolving Agent Training, which combines multi-environment reinforcement learning with a self-evolving agent arena that automatically identifies capability gaps through dynamic task synthesis and drives targeted learning, enabling the co-evolution of agent policies and environments. Across 23 challenging agent benchmarks, Agent-World-8B and 14B consistently outperforms strong proprietary models and environment scaling baselines. Further analyses reveal scaling trends in relation to environment diversity and self-evolution rounds, offering insights for building general agent intelligence.
Problem

Research questions and friction points this paper is trying to address.

general agent intelligence
real-world environment synthesis
lifelong learning
agent training
environment scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-World
self-evolving training
environment-task synthesis
multi-environment reinforcement learning
general agent intelligence
🔎 Similar Papers
No similar papers found.
Guanting Dong
Guanting Dong
Remin University of China
LLM Reasoning & AlignmentDeep Search AgentAgentic RL
Junting Lu
Junting Lu
Peking University
Multimodal Agent
Junjie Huang
Junjie Huang
College of Computer and Information Science, Southwest University, China
Social Network AnalysisGraph Neural NetworksComputational Social Science
Wanjun Zhong
Wanjun Zhong
Bytedance Seed Research
NLP
L
Longxiang Liu
Renmin University of China, ByteDance Seed
Shijue Huang
Shijue Huang
Hong Kong University of Science and Technology
Large Language ModelsReasoningAgent
Zhenyu Li
Zhenyu Li
University of Science and Technology of China
Electronic structure calculationsMolecular simulationMaterials science
Yang Zhao
Yang Zhao
Research Professor, Zhejiang University, China
Intelligent BuildingSmart GridFault detection and diagnosisEnergy efficiency
Xiaoshuai Song
Xiaoshuai Song
Beijing University of Posts and Telecommunications
Xiaoxi Li
Xiaoxi Li
Renmin University of China
RAGLLM ReasoningDeep ResearchAgent
Jiajie Jin
Jiajie Jin
Renmin University of China
Information RetrievalLarge Language Models
Y
Yutao Zhu
Renmin University of China, ByteDance Seed
Hanbin Wang
Hanbin Wang
Peking University
Natural Language ProcessingCode IntelligenceInformation Retrieval
Fangyu Lei
Fangyu Lei
Institute of Automation, Chinese Academy of Sciences
LLM-AgentCode GenerationText-to-SQLTable Reasoning
Qinyu Luo
Qinyu Luo
Johns Hopkins University; THUNLP
LLM-driven AgentsMuti-Agent SystemMultimodality and Real world sensing
Mingyang Chen
Mingyang Chen
Baichuan Inc., Zhejiang University, The University of Edinburgh
Large Language ModelReinforcement LearningKnowledge Graph
Zehui Chen
Zehui Chen
USTC
Jiazhan Feng
Jiazhan Feng
University of Oxford; PhD at Peking University
Natural Language ProcessingLarge Language ModelsMultimodal Agent
Ji-Rong Wen
Ji-Rong Wen
Gaoling School of Artificial Intelligence, Renmin University of China
Large Language ModelWeb SearchInformation RetrievalMachine Learning
Zhicheng Dou
Zhicheng Dou
Renmin University of China
Information RetrievalRetrieval Augmented GenerationLarge Language ModelsGenerative IR