Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management

📅 2024-06-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This paper addresses revenue management in selection-based networks—a canonical continuous-time intensity control problem—by overcoming the traditional limitation of requiring prespecified time discretization. Leveraging the intrinsic discreteness of jump points in intensity processes, we establish the first reinforcement learning (RL) theoretical framework for intensity control in continuous time. We propose discretization-free Monte Carlo and temporal-difference policy evaluation methods, along with a policy-gradient-based Actor-Critic algorithm. Our approach eliminates discretization-induced bias and computational redundancy, substantially improving policy evaluation accuracy and training efficiency. Extensive benchmark experiments demonstrate that the proposed algorithm consistently outperforms existing discretized RL methods and classical heuristics in both revenue performance and computational cost. This work provides a scalable, theoretically grounded paradigm for continuous-time dynamic optimization.

Technology Category

Application Category

📝 Abstract

Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state space, a large action space and a continuous time horizon. We show that by utilizing the inherent discretization of the sample paths created by the jump points, a unique and defining feature of intensity control, one does not need to discretize the time horizon in advance, which was believed to be necessary because most reinforcement learning algorithms are designed for discrete-time problems. As a result, the computation can be facilitated and the discretization error is significantly reduced. We lay the theoretical foundation for the Monte Carlo and temporal difference learning algorithms for policy evaluation and develop policy gradient based actor critic algorithms for intensity control. Via a comprehensive numerical study, we demonstrate the benefit of our approach versus other state-of-the-art benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Adapts RL to intensity control for revenue management

Eliminates need for pre-discretizing continuous-time problems

Reduces discretization error in large state-action spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for intensity control

No pre-discretization of time horizon

Policy-gradient-based actor-critic algorithms

🔎 Similar Papers

No similar papers found.