🤖 AI Summary
To address the power-rate trade-off induced by low-resolution analog-to-digital converters (ADCs) in millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems, this paper proposes a joint optimization framework for hybrid beamforming and learnable ADC quantization thresholds. We are the first to introduce deep reinforcement learning (DRL) into this setting; theoretically, our greedy policy is proven to converge to the global optimum and exhibits robustness against time-varying channel statistics and imperfect noise-aware channel state information (CSI) estimation. The method integrates a mutual-information-driven deep neural network estimator with policy-gradient-based reinforcement learning to enable end-to-end joint design. Experimental results demonstrate that the proposed approach achieves performance close to exhaustive search, significantly reducing rate loss. Moreover, it maintains high stability under dynamic channel conditions, yielding measured spectral efficiency gains of 12–18% over conventional heuristic methods.
📝 Abstract
Multiple-input multiple-output (MIMO) wireless systems conventionally use high-resolution analog-to-digital converters (ADCs) at the receiver side to faithfully digitize received signals prior to digital signal processing. However, the power consumption of ADCs increases significantly as the bandwidth is increased, particularly in millimeter wave communications systems. A combination of two mitigating approaches has been considered in the literature: i) to use hybrid beamforming to reduce the number of ADCs, and ii) to use low-resolution ADCs to reduce per ADC power consumption. Lowering the number and resolution of the ADCs naturally reduces the communication rate of the system, leading to a tradeoff between ADC power consumption and communication rate. Prior works have shown that optimizing over the hybrid beamforming matrix and ADC thresholds may reduce the aforementioned rate-loss significantly. A key challenge is the complexity of optimization over all choices of beamforming matrices and threshold vectors. This work proposes a reinforcement learning (RL) architecture to perform the optimization. The proposed approach integrates deep neural network-based mutual information estimators for reward calculation with policy gradient methods for reinforcement learning. The approach is robust to dynamic channel statistics and noisy CSI estimates. It is shown theoretically that greedy RL methods converge to the globally optimal policy. Extensive empirical evaluations are provided demonstrating that the performance of the RL-based approach closely matches exhaustive search optimization across the solution space.