🤖 AI Summary
This work addresses the high channel state information (CSI) estimation overhead and the curse of dimensionality in centralized optimization for large-scale millimeter-wave networks by proposing a CSI-free hierarchical multi-agent reinforcement learning architecture. The approach replaces conventional channel estimation with user location information and employs a two-level controller hierarchy to cooperatively manage mechanical reconfigurable intelligent surfaces (RIS) for efficient beam focusing. By innovatively integrating spatial intelligence with hierarchical decision-making, the method eliminates CSI acquisition overhead while preserving focusing accuracy. Implemented within a MAPPO framework leveraging centralized training with decentralized execution (CTDE), the system achieves up to a 7.79 dB gain in received signal strength over centralized baselines in ray-tracing simulations, demonstrating excellent scalability to multiple users and robustness under sub-meter-level localization errors.
📝 Abstract
Reconfigurable Intelligent Surfaces (RIS) has a potential to engineer smart radio environments for next-generation millimeter-wave (mmWave) networks. However, the prohibitive computational overhead of Channel State Information (CSI) estimation and the dimensionality explosion inherent in centralized optimization severely hinder practical large-scale deployments. To overcome these bottlenecks, we introduce a ``CSI-free" paradigm powered by a Hierarchical Multi-Agent Reinforcement Learning (HMARL) architecture to control mechanically reconfigurable reflective surfaces. By substituting pilot-based channel estimation with accessible user localization data, our framework leverages spatial intelligence for macro-scale wave propagation management. The control problem is decomposed into a two-tier neural architecture: a high-level controller executes temporally extended, discrete user-to-reflector allocations, while low-level controllers autonomously optimize continuous focal points utilizing Multi-Agent Proximal Policy Optimization (MAPPO) under a Centralized Training with Decentralized Execution (CTDE) scheme. Comprehensive deterministic ray-tracing evaluations demonstrate that this hierarchical framework achieves massive RSSI improvements of up to 7.79 dB over centralized baselines. Furthermore, the system exhibits robust multi-user scalability and maintains highly resilient beam-focusing performance under practical sub-meter localization tracking errors. By eliminating CSI overhead while maintaining high-fidelity signal redirection, this work establishes a scalable and cost-effective blueprint for intelligent wireless environments.