🤖 AI Summary
This work addresses the online power control problem in energy harvesting communication systems operating over wireless fading channels and proposes a low-complexity, near-optimal solution. By constructing a linear approximation of the relative value function in the Bellman equation, the study introduces for the first time a truncated Lyapunov-like function form to design both optimistic and robust control policies. Notably, the robust policy requires at most five parameters, offering strong theoretical interpretability and practical applicability. The approach further integrates domain knowledge into reinforcement learning and combines it with a weighted directional water-filling algorithm, substantially improving learning efficiency. Experimental results demonstrate that the proposed method incurs less than 2% performance loss across diverse scenarios while achieving significantly lower computational complexity compared to existing solutions.
📝 Abstract
This paper investigates online power control for point-to-point energy harvesting communications over wireless fading channels. A linear-policy-based approximation is derived for the relative-value function in the Bellman equation of the power control problem. This approximation leads to two fundamental power control policies: optimistic and robust clipped affine policies, both taking the form of a clipped affine function of the battery level and the reciprocal of channel signal-to-noise ratio coefficient. They are essentially battery-limited weighted directional waterfilling policies operating between adjacent time slots. By leveraging the relative-value approximation and derived policies, a domain-knowledge-enhanced reinforcement learning (RL) algorithm is proposed for online power control. The proposed approach is further extended to scenarios with energy and/or channel lookahead. Comprehensive simulation results demonstrate that the proposed methods achieve a good balance between computational complexity and optimality. In particular, the robust clipped affine policy (combined with RL, using at most five parameters) outperforms all existing approaches across various scenarios, with less than 2\% performance loss relative to the optimal policy.