🤖 AI Summary
This work addresses the inefficiency of spectrum management in large-scale wireless networks, where explosive action spaces and the computational intractability of conventional optimization methods hinder performance. To overcome these challenges, the authors propose a large language model (LLM)-driven spectrum access framework based on Group Relative Policy Optimization (GRPO). The framework introduces a novel hierarchical state serialization mechanism that integrates global statistics with local constraints within a limited context window, enabling high-dimensional reasoning. It further incorporates code-driven reasoning and direct execution feedback to circumvent the cold-start bottleneck inherent in supervised fine-tuning. Experimental results demonstrate that the proposed approach achieves superior scalability, stable spectral utility, and strong generalization under stringent latency constraints, significantly outperforming random heuristic strategies and surpassing classical partition-based solvers even in ultra-dense network scenarios.
📝 Abstract
Efficient spectrum management in massive-scale wireless networks is increasingly challenged by explosive action spaces and the computational intractability of traditional optimization. This study proposes a Large-Scale LLM-Driven Spectrum Access (LSA) framework rooted in Group Relative Policy Optimization (GRPO). To overcome the computational collapse caused by ultra-long prompts in large-scale scenarios, we develop a hierarchical state serialization mechanism that synthesizes global environment statistics with localized critical constraints, enabling the LLM to perform high-dimensional reasoning within a bounded context window. Simulation results under strictly time-bounded inference protocols reveal that the code-driven paradigm eliminates the SFT cold-start bottleneck and leverages direct execution feedback to achieve superior scaling laws. The framework maintains robust spectral utility and generalization across varying network scales, yielding consistent and empirically superior performance over non-deterministic heuristics, and surpassing partitioned classical solvers in ultra-dense regimes under matched compute budgets.