🤖 AI Summary
This work addresses the weak generalization and poor robustness of large language models (LLMs) in real-world function-calling tasks. We propose Environment-Extended Agent Training, a novel paradigm for developing general-purpose intelligent agents. Our approach features: (1) an automated framework for generating diverse, high-fidelity, heterogeneous simulation environments; and (2) a two-stage fine-tuning strategy that progressively enhances agent capabilities—from foundational function comprehension to domain-specific function invocation. The method integrates automated environment generation, full-fidelity interactive simulation, and large-scale instruction-action alignment training. Evaluated on tau-bench, tau2-Bench, and ACEBench, our AgentScaler achieves substantial improvements in function-call accuracy (+8.2%–14.6%) and cross-environment robustness. Results empirically validate that environment expansion significantly advances the modeling of general intelligence in autonomous agents.
📝 Abstract
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, systematically broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.