ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of interactive training environments and the limitations of synthetic approaches in diversity and scalability, which hinder self-exploratory learning for general-purpose agents. The paper introduces, for the first time, a framework for constructing highly diverse, scalable, and verifiable interactive environments from scratch. It ensures environmental reliability through procedural testing, and guarantees task completeness and solvability by integrating tool dependency graph expansion with executable action validation. The proposed method substantially improves agent performance on unseen multi-turn tool-use benchmarks such as τ²-Bench and VitaBench, demonstrating that the scale and diversity of training environments play a critical role in enhancing agent generalization capabilities.

Technology Category

Application Category

📝 Abstract
Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as $\tau^2$-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.
Problem

Research questions and friction points this paper is trying to address.

interactive environments
environment synthesis
generalist agents
tool-use
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

environment synthesis
generalist agent
tool-use
procedural testing
task verifiability
🔎 Similar Papers
No similar papers found.
D
Dunwei Tu
National Key Laboratory for Novel Software Technology, Nanjing University; School of Artificial Intelligence, Nanjing University, Nanjing, China
H
Hongyan Hao
Meituan, Beijing, China
Hansi Yang
Hansi Yang
Hong Kong University of Science and Technology
meta-learningfew-shot learningAutoML
Y
Yihao Chen
Institute of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Yi-Kai Zhang
Yi-Kai Zhang
Nanjing University
Model RecommendationMultimodal Large Language Model
Z
Zhikang Xia
Meituan, Beijing, China
Y
Yu Yang
Meituan, Beijing, China
Y
Yueqing Sun
Meituan, Beijing, China
X
Xingchen Liu
School of Statistics, East China Normal University, Shanghai, China
Furao Shen
Furao Shen
Department of Computer Science & Technology, Nanjing University
Neural NetworksRobotic Intelligence
Q
Qi Gu
Meituan, Beijing, China
Hui Su
Hui Su
MeiTuan
Natural Language Processing
X
Xunliang Cai
Meituan, Beijing, China