🤖 AI Summary
This paper addresses revenue management in selection-based networks—a canonical continuous-time intensity control problem—by overcoming the traditional limitation of requiring prespecified time discretization. Leveraging the intrinsic discreteness of jump points in intensity processes, we establish the first reinforcement learning (RL) theoretical framework for intensity control in continuous time. We propose discretization-free Monte Carlo and temporal-difference policy evaluation methods, along with a policy-gradient-based Actor-Critic algorithm. Our approach eliminates discretization-induced bias and computational redundancy, substantially improving policy evaluation accuracy and training efficiency. Extensive benchmark experiments demonstrate that the proposed algorithm consistently outperforms existing discretized RL methods and classical heuristics in both revenue performance and computational cost. This work provides a scalable, theoretically grounded paradigm for continuous-time dynamic optimization.
📝 Abstract
Intensity control is a type of continuous-time dynamic optimization problems with many important applications in Operations Research including queueing and revenue management. In this study, we adapt the reinforcement learning framework to intensity control using choice-based network revenue management as a case study, which is a classical problem in revenue management that features a large state space, a large action space and a continuous time horizon. We show that by utilizing the inherent discretization of the sample paths created by the jump points, a unique and defining feature of intensity control, one does not need to discretize the time horizon in advance, which was believed to be necessary because most reinforcement learning algorithms are designed for discrete-time problems. As a result, the computation can be facilitated and the discretization error is significantly reduced. We lay the theoretical foundation for the Monte Carlo and temporal difference learning algorithms for policy evaluation and develop policy gradient based actor critic algorithms for intensity control. Via a comprehensive numerical study, we demonstrate the benefit of our approach versus other state-of-the-art benchmarks.