A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper addresses online decision optimization under queue stability constraints in Internet-of-Things (IoT) systems. Method: We propose LDPTRLQ, the first algorithm to achieve a deep theoretical integration of the Lyapunov drift-plus-penalty framework with reinforcement learning (RL) under realistic assumptions—departing from the conventional “stabilize-then-optimize” paradigm. LDPTRLQ intrinsically embeds queue stability as a constraint within the RL policy update, simultaneously guaranteeing strict Lyapunov stability and maximizing long-term cumulative reward. Contribution/Results: Theoretical analysis establishes convergence and stability guarantees. Extensive evaluations across diverse IoT simulation tasks demonstrate that LDPTRLQ significantly outperforms pure Lyapunov-based methods, standalone RL approaches, and state-of-the-art baselines in convergence speed, queue stability, and policy performance—achieving new state-of-the-art (SOTA) results.

Technology Category

Application Category

📝 Abstract

With the proliferation of Internet of Things (IoT) devices, the demand for addressing complex optimization challenges has intensified. The Lyapunov Drift-Plus-Penalty algorithm is a widely adopted approach for ensuring queue stability, and some research has preliminarily explored its integration with reinforcement learning (RL). In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for RL applications, deriving an effective method for combining Lyapunov Drift-Plus-Penalty with RL under a set of common and reasonable conditions through rigorous theoretical analysis. Unlike existing approaches that directly merge the two frameworks, our proposed algorithm, termed Lyapunov drift-plus-penalty method tailored for reinforcement learning with queue stability (LDPTRLQ) algorithm, offers theoretical superiority by effectively balancing the greedy optimization of Lyapunov Drift-Plus-Penalty with the long-term perspective of RL. Simulation results for multiple problems demonstrate that LDPTRLQ outperforms the baseline methods using the Lyapunov drift-plus-penalty method and RL, corroborating the validity of our theoretical derivations. The results also demonstrate that our proposed algorithm outperforms other benchmarks in terms of compatibility and stability.

Problem

Research questions and friction points this paper is trying to address.

Adapting Lyapunov Drift-Plus-Penalty for reinforcement learning applications

Balancing greedy optimization with long-term RL perspective

Ensuring queue stability in IoT-driven optimization challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Lyapunov Drift-Plus-Penalty with RL

Ensures queue stability in optimization challenges

Balances greedy optimization and long-term RL perspective

🔎 Similar Papers

No similar papers found.