🤖 AI Summary
To address beam misalignment caused by user mobility in millimeter-wave (mmWave) mobile communications, this paper proposes a feedback-free, model-free sensing-assisted dynamic beam management scheme. The method integrates real-time sensing echo data and angle-of-departure (AoD) estimation accuracy to adaptively switch between multi-beam and single-beam transmission—without requiring user feedback or prior knowledge of channel dynamics. Crucially, it is the first work to incorporate deep reinforcement learning (DRL) into a joint sensing-communication architecture for beam management. Performance is rigorously analyzed via angular discretization modeling and Cramér–Rao lower bound (CRLB)-driven evaluation. Results demonstrate significant throughput gains across diverse user velocities, achieving both high spectral efficiency and robustness to high-speed mobility—outperforming conventional beam sweeping and AoD-based heuristic approaches.
📝 Abstract
Mobile users are prone to experience beam failure due to beam drifting in millimeter wave (mmWave) communications. Sensing can help alleviate beam drifting with timely beam changes and low overhead since it does not need user feedback. This work studies the problem of optimizing sensing-aided communication by dynamically managing beams allocated to mobile users. A multi-beam scheme is introduced, which allocates multiple beams to the users that need an update on the angle of departure (AoD) estimates and a single beam to the users that have satisfied AoD estimation precision. A deep reinforcement learning (DRL) assisted method is developed to optimize the beam allocation policy, relying only upon the sensing echoes. For comparison, a heuristic AoD-based method using approximated Cram'er-Rao lower bound (CRLB) for allocation is also presented. Both methods require neither user feedback nor prior state evolution information. Results show that the DRL-assisted method achieves a considerable gain in throughput than the conventional beam sweeping method and the AoD-based method, and it is robust to different user speeds.