A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses online decision optimization under queue stability constraints in Internet-of-Things (IoT) systems. Method: We propose LDPTRLQ, the first algorithm to achieve a deep theoretical integration of the Lyapunov drift-plus-penalty framework with reinforcement learning (RL) under realistic assumptions—departing from the conventional “stabilize-then-optimize” paradigm. LDPTRLQ intrinsically embeds queue stability as a constraint within the RL policy update, simultaneously guaranteeing strict Lyapunov stability and maximizing long-term cumulative reward. Contribution/Results: Theoretical analysis establishes convergence and stability guarantees. Extensive evaluations across diverse IoT simulation tasks demonstrate that LDPTRLQ significantly outperforms pure Lyapunov-based methods, standalone RL approaches, and state-of-the-art baselines in convergence speed, queue stability, and policy performance—achieving new state-of-the-art (SOTA) results.

Technology Category

Application Category

📝 Abstract
With the proliferation of Internet of Things (IoT) devices, the demand for addressing complex optimization challenges has intensified. The Lyapunov Drift-Plus-Penalty algorithm is a widely adopted approach for ensuring queue stability, and some research has preliminarily explored its integration with reinforcement learning (RL). In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for RL applications, deriving an effective method for combining Lyapunov Drift-Plus-Penalty with RL under a set of common and reasonable conditions through rigorous theoretical analysis. Unlike existing approaches that directly merge the two frameworks, our proposed algorithm, termed Lyapunov drift-plus-penalty method tailored for reinforcement learning with queue stability (LDPTRLQ) algorithm, offers theoretical superiority by effectively balancing the greedy optimization of Lyapunov Drift-Plus-Penalty with the long-term perspective of RL. Simulation results for multiple problems demonstrate that LDPTRLQ outperforms the baseline methods using the Lyapunov drift-plus-penalty method and RL, corroborating the validity of our theoretical derivations. The results also demonstrate that our proposed algorithm outperforms other benchmarks in terms of compatibility and stability.
Problem

Research questions and friction points this paper is trying to address.

Adapting Lyapunov Drift-Plus-Penalty for reinforcement learning applications
Balancing greedy optimization with long-term RL perspective
Ensuring queue stability in IoT-driven optimization challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Lyapunov Drift-Plus-Penalty with RL
Ensures queue stability in optimization challenges
Balances greedy optimization and long-term RL perspective
🔎 Similar Papers
No similar papers found.
W
Wenhan Xu
Internet of Things Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong 511400, China
Jiashuo Jiang
Jiashuo Jiang
Hong Kong University of Science and Technology
operations researchoperations managementoptimizationapproximation algorithmsmachine learning
L
Lei Deng
Internet of Things Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong 511400, China
D
Danny Hin-Kwok Tsang
Internet of Things Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong 511400, China, and also with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China