Physics-Informed Parametric Bandits for Beam Alignment in mmWave Communications

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Millimeter-wave beam alignment suffers from slow convergence in large beamspaces and relies on bandit algorithms that assume unimodal or multimodal reward structures—assumptions often violated by real-world channel characteristics. This paper breaks from conventional structural assumptions by introducing phase retrieval—a signal recovery paradigm—into the stochastic multi-armed bandit framework. Leveraging the physical prior of channel multipath sparsity, we propose a parametric, physics-informed beam selection model. We design two algorithms—PreTC and PRGreedy—that jointly perform stochastic exploration and online path parameter estimation, enabling efficient and robust optimization without strong structural assumptions. Extensive evaluations on DeepMIMO and DeepSense6G datasets demonstrate that our methods significantly outperform state-of-the-art approaches in both static and mobile scenarios, achieving faster convergence and superior generalization across diverse channel conditions.

Technology Category

Application Category

📝 Abstract
In millimeter wave (mmWave) communications, beam alignment and tracking are crucial to combat the significant path loss. As scanning the entire directional space is inefficient, designing an efficient and robust method to identify the optimal beam directions is essential. Since traditional bandit algorithms require a long time horizon to converge under large beam spaces, many existing works propose efficient bandit algorithms for beam alignment by relying on unimodality or multimodality assumptions on the reward function's structure. However, such assumptions often do not hold (or cannot be strictly satisfied) in practice, which causes such algorithms to converge to choosing suboptimal beams. In this work, we propose two physics-informed bandit algorithms extit{pretc} and extit{prgreedy} that exploit the sparse multipath property of mmWave channels - a generic but realistic assumption - which is connected to the Phase Retrieval Bandit problem. Our algorithms treat the parameters of each path as black boxes and maintain optimal estimates of them based on sampled historical rewards. extit{pretc} starts with a random exploration phase and then commits to the optimal beam under the estimated reward function. extit{prgreedy} performs such estimation in an online manner and chooses the best beam under current estimates. Our algorithms can also be easily adapted to beam tracking in the mobile setting. Through experiments using both the synthetic DeepMIMO dataset and the real-world DeepSense6G dataset, we demonstrate that both algorithms outperform existing approaches in a wide range of scenarios across diverse channel environments, showing their generalizability and robustness.
Problem

Research questions and friction points this paper is trying to address.

Address inefficient beam alignment in mmWave communications
Overcome limitations of traditional bandit algorithms' convergence
Exploit sparse multipath property for optimal beam selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed bandits exploit sparse multipath property
Algorithms estimate path parameters from sampled historical rewards
Methods outperform existing approaches in diverse channel environments
🔎 Similar Papers
No similar papers found.