🤖 AI Summary
Millimeter-wave beam alignment suffers from slow convergence in large beamspaces and relies on bandit algorithms that assume unimodal or multimodal reward structures—assumptions often violated by real-world channel characteristics. This paper breaks from conventional structural assumptions by introducing phase retrieval—a signal recovery paradigm—into the stochastic multi-armed bandit framework. Leveraging the physical prior of channel multipath sparsity, we propose a parametric, physics-informed beam selection model. We design two algorithms—PreTC and PRGreedy—that jointly perform stochastic exploration and online path parameter estimation, enabling efficient and robust optimization without strong structural assumptions. Extensive evaluations on DeepMIMO and DeepSense6G datasets demonstrate that our methods significantly outperform state-of-the-art approaches in both static and mobile scenarios, achieving faster convergence and superior generalization across diverse channel conditions.
📝 Abstract
In millimeter wave (mmWave) communications, beam alignment and tracking are crucial to combat the significant path loss. As scanning the entire directional space is inefficient, designing an efficient and robust method to identify the optimal beam directions is essential. Since traditional bandit algorithms require a long time horizon to converge under large beam spaces, many existing works propose efficient bandit algorithms for beam alignment by relying on unimodality or multimodality assumptions on the reward function's structure. However, such assumptions often do not hold (or cannot be strictly satisfied) in practice, which causes such algorithms to converge to choosing suboptimal beams.
In this work, we propose two physics-informed bandit algorithms extit{pretc} and extit{prgreedy} that exploit the sparse multipath property of mmWave channels - a generic but realistic assumption - which is connected to the Phase Retrieval Bandit problem. Our algorithms treat the parameters of each path as black boxes and maintain optimal estimates of them based on sampled historical rewards. extit{pretc} starts with a random exploration phase and then commits to the optimal beam under the estimated reward function. extit{prgreedy} performs such estimation in an online manner and chooses the best beam under current estimates. Our algorithms can also be easily adapted to beam tracking in the mobile setting. Through experiments using both the synthetic DeepMIMO dataset and the real-world DeepSense6G dataset, we demonstrate that both algorithms outperform existing approaches in a wide range of scenarios across diverse channel environments, showing their generalizability and robustness.