🤖 AI Summary
This work addresses the lack of closed-loop interaction capabilities in existing 6G network management benchmarks, which hinders agents from achieving autonomous decision-making through tool invocation, state observation, and feedback-driven learning. To bridge this gap, we propose 6GAgentGym, the first closed-loop agent training framework tailored for 6G environments. The framework integrates 42 distinct tools, explicitly distinguishing between read-only observations and state-modifying actions, and leverages NS-3 simulation data for calibration alongside Self-Instruct data synthesis, supervised fine-tuning, and reinforcement learning. Built upon an 8B open-source foundation model, our approach achieves overall success rates on 6GAgentBench comparable to those of GPT-5 and demonstrates significant superiority over existing models in long-horizon tasks.
📝 Abstract
Autonomous 6G network management requires agents that can execute tools, observe the resulting state changes, and adapt their decisions accordingly. Existing benchmarks based on static questions or scripted episode replay, however, do not support such closed-loop interaction, limiting agents to passive evaluation without the ability to learn from environmental feedback. This paper presents 6GAgentGym to provide closed-loop capability. The framework provides an interactive environment with 42 typed tools whose effect classification distinguishes read-only observation from state-mutating configuration, backed by a learned Experiment Model calibrated on NS-3 simulation data. 6G-Forge bootstraps closed-loop training trajectories from NS-3 seeds via iterative Self-Instruct generation with execution verification against the Experiment Model. Supervised fine-tuning on the resulting corpus followed by reinforcement learning with online closed-loop interaction enables an 8B open-source model to achieve comparable overall success rate to GPT-5 on the accompanying 6GAgentBench, with stronger performance on long-horizon tasks. Together, these components provide a viable path toward autonomous, closed-loop network management.