ID policy (with reassignment) is asymptotically optimal for heterogeneous weakly-coupled MDPs

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the curse of dimensionality and long-run average reward optimization in fully heterogeneous weakly coupled Markov decision processes (WCMDPs), where disparate model parameters across subsystems severely impede scalability and performance. We propose an Index Policy with Redistribution (ID), the first to achieve provably asymptotically optimal performance—specifically, an $O(1/sqrt{N})$ optimality gap—for fully heterogeneous average-reward WCMDPs. Our key innovation is a projection-based Lyapunov function that jointly characterizes reward convergence and constraint satisfaction under heterogeneity. By integrating stochastic process theory and asymptotic analysis under mild regularity conditions, we derive tight bounds on the per-arm optimality gap for long-run average rewards. This work breaks a longstanding theoretical barrier in heterogeneous WCMDPs by establishing the first asymptotically optimal index policy with rigorous finite-$N$ performance guarantees.

Technology Category

Application Category

📝 Abstract
Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, a natural adaptation of the ID policy, although originally proposed for a homogeneous special case of WCMDPs, in fact achieves an $O(1/sqrt{N})$ optimality gap in long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our techniques highlight the construction of a novel projection-based Lyapunov function, which witnesses the convergence of rewards and costs to an optimal region in the presence of heterogeneity.
Problem

Research questions and friction points this paper is trying to address.

Addresses heterogeneity in decision-making
Optimizes weakly-coupled MDPs
Reduces dimensionality curse for large N
Innovation

Methods, ideas, or system contributions that make the work stand out.

ID policy adaptation
projection-based Lyapunov function
asymptotic optimality
🔎 Similar Papers
No similar papers found.