🤖 AI Summary
Current large language models (LLMs) face limitations in evolving from passive responders to autonomous agents due to the lack of scalable, high-fidelity interactive environments that support reward-driven policy learning. This work introduces Nex, an integrated, scalable infrastructure for interactive agent training, comprising three orthogonal extensions: (1) NexAU—a hierarchical, compositional agent architecture; (2) NexA4A—natural-language-driven, cross-domain automatic agent generation; and (3) NexGAP—real-world dynamic environment feedback integration to enhance simulation fidelity. Leveraging reinforcement learning and large-scale trajectory synthesis, the resulting model, Nex-N1, achieves state-of-the-art performance among open-source models on benchmarks including SWE-bench and tau2, matching leading closed-source models. All components—including infrastructure, training pipelines, and the Nex-N1 model—are fully open-sourced to foster reproducible, community-driven advancement in autonomous agent research.
📝 Abstract
The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms -- from static imitation to incentive-driven decision making. However, this transition is significantly impeded by the lack of scalable infrastructure capable of constructing high-quality interaction signals for effective policy learning. To address this, we introduce a comprehensive method designed to systematically scale the diversity and complexity of interactive environments. Our method realizes this scaling by addressing three orthogonal dimensions: (1) Complexity: NexAU, a flexible agent framework that supports building complex agent hierarchies via simple configurations; (2) Diversity: NexA4A automatically generates diverse agent hierarchies from natural language to cover infinite domains; and (3) Fidelity: NexGAP bridges the simulation-reality gap by integrating dynamic real-world environment for grounded trajectories synthesis. We train Nex-N1 upon the diverse and complex interactive environments established by our infrastructure. Empirical results on benchmarks such as SWE-bench and tau2 demonstrate that Nex-N1 consistently outperforms SOTA open-source models and achieves competitive performance against frontier proprietary models on complex agentic tasks. We open-source the Nex ecosystem and model weights to facilitate further research.