🤖 AI Summary
To address the poor generalizability, weak robustness, and reliance on retraining inherent in conventional wireless MAC protocols, this paper proposes the first large language model (LLM)-based multi-agent MAC protocol framework. Methodologically, it innovatively models uplink and downlink transmissions as a dynamic multi-follower Stackelberg game, integrates an LLM-driven agent coordination mechanism, and employs Protocol Action Grammar (PAG) to ensure syntactic and semantic reliability of generated protocols. Crucially, the framework enables semantic-level adaptive protocol generation without retraining and maintains equilibrium convergence under dynamic user population changes. Simulation results demonstrate that, compared to traditional baselines, the proposed framework achieves a 77.6% increase in system throughput and a 65.2% improvement in fairness, while significantly enhancing both generalizability and robustness across diverse network conditions.
📝 Abstract
Medium Access Control (MAC) protocols, essential for wireless networks, are typically manually configured. While deep reinforcement learning (DRL)-based protocols enhance task-specified network performance, they suffer from poor generalizability and resilience, demanding costly retraining to adapt to dynamic environments. To overcome this limitation, we introduce a game-theoretic LLM-empowered multi-agent DRL (MARL) framework, in which the uplink transmission between a base station and a varying number of user equipments is modeled as a dynamic multi-follower Stackelberg game (MFSG), capturing the network's natural hierarchical structure. Within this game, LLM-driven agents, coordinated through proximal policy optimization (PPO), synthesize adaptive, semantic MAC protocols in response to network dynamics. Protocol action grammar (PAG) is employed to ensure the reliability and efficiency of this process. Under this system, we further analyze the existence and convergence behavior in terms of a Stackelberg equilibrium by studying the learning dynamics of LLM-empowered unified policies in response to changing followers. Simulations corroborate that our framework achieves a 77.6% greater throughput and a 65.2% fairness improvement over conventional baselines. Besides, our framework generalizes excellently to a fluctuating number of users without requiring retraining or architectural changes.