ID policy (with reassignment) is asymptotically optimal for heterogeneous weakly-coupled MDPs

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper addresses the curse of dimensionality and long-run average reward optimization in fully heterogeneous weakly coupled Markov decision processes (WCMDPs), where disparate model parameters across subsystems severely impede scalability and performance. We propose an Index Policy with Redistribution (ID), the first to achieve provably asymptotically optimal performance—specifically, an $O(1/sqrt{N})$ optimality gap—for fully heterogeneous average-reward WCMDPs. Our key innovation is a projection-based Lyapunov function that jointly characterizes reward convergence and constraint satisfaction under heterogeneity. By integrating stochastic process theory and asymptotic analysis under mild regularity conditions, we derive tight bounds on the per-arm optimality gap for long-run average rewards. This work breaks a longstanding theoretical barrier in heterogeneous WCMDPs by establishing the first asymptotically optimal index policy with rigorous finite-$N$ performance guarantees.

Technology Category

Application Category

📝 Abstract

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, a natural adaptation of the ID policy, although originally proposed for a homogeneous special case of WCMDPs, in fact achieves an $O(1/sqrt{N})$ optimality gap in long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our techniques highlight the construction of a novel projection-based Lyapunov function, which witnesses the convergence of rewards and costs to an optimal region in the presence of heterogeneity.

Problem

Research questions and friction points this paper is trying to address.

Addresses heterogeneity in decision-making

Optimizes weakly-coupled MDPs

Reduces dimensionality curse for large N

Innovation

Methods, ideas, or system contributions that make the work stand out.

ID policy adaptation

projection-based Lyapunov function

asymptotic optimality

🔎 Similar Papers

No similar papers found.