Markov Decision Processing Networks

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies multiclass queueing networks whose service processes are controlled Markov processes, where service actions dynamically affect future service capacity—arising in assembly-to-order systems, ride-hailing matching, cross-skilled call centers, and quantum switches. Classical MaxWeight policies fail in such settings due to capacity non-stationarity and action-dependent service dynamics. To address this, we propose a novel multi-objective Markov decision process (MDP) framework and design a throughput-optimal dynamic control policy via a two-timescale decomposition coupled with a weighted-average reward mechanism. We rigorously characterize a new, action-aware capacity region—extending beyond classical static definitions—and prove the policy achieves optimal throughput within this region. Our results provide a unified capacity characterization framework and a scalable optimization methodology applicable to broad classes of bipartite matching systems.

Technology Category

Application Category

📝 Abstract
We introduce Markov Decision Processing Networks (MDPNs) as a multiclass queueing network model where service is a controlled, finite-state Markov process. The model exhibits a decision-dependent service process where actions taken influence future service availability. Viewed as a two-sided queueing model, this captures settings such as assemble-to-order systems, ride-hailing platforms, cross-skilled call centers, and quantum switches. We first characterize the capacity region of MDPNs. Unlike classical switched networks, the MDPN capacity region depends on the long-run mix of service states induced by the control of the underlying service process. We show, via a counterexample, that MaxWeight is not throughput-optimal in this class, demonstrating the distinction between MDPNs and classical queueing models. To bridge this gap, we design a weighted average reward policy, a multiobjective MDP that leverages a two-timescale separation at the fluid scale. We prove throughput-optimality of the resulting policy. The techniques yield a clear capacity region description and apply to a broad family of two-sided matching systems.
Problem

Research questions and friction points this paper is trying to address.

Modeling multiclass queueing networks with controlled Markov service processes
Characterizing capacity regions dependent on service state controls
Designing throughput-optimal policies for two-sided matching systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiclass queueing network with controlled Markov service
Decision-dependent service process influencing availability
Weighted average reward policy ensuring throughput-optimality