Multi-Robot Multi-Queue Control via Exhaustive Assignment Actor-Critic Learning

📅 2026-04-04

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study addresses the online multi-robot task allocation problem under asymmetric stochastic task arrivals and switching delays by formulating it as a discounted-cost Markov decision process. The authors propose a structure-aware Actor-Critic reinforcement learning approach that enforces an exhaustive service policy and restricts learning to the next queue assignment for idle robots only. This design overcomes the limitation of conventional longest-queue-first rules, which are restricted to symmetric settings, and enables adaptive response to asymmetric arrival rates while embedding the structural properties of optimal policies. Experimental results demonstrate that the proposed method significantly outperforms the ESL baseline across diverse configurations of server-to-location ratios, system loads, and asymmetry levels, achieving substantially lower discounted holding costs and average queue lengths, with performance closely approaching the theoretical optimum.

Technology Category

Application Category

📝 Abstract

We study online task allocation for multi-robot, multi-queue systems with asymmetric stochastic arrivals and switching delays. We formulate the problem in discrete time: each location can host at most one robot per slot, servicing a task consumes one slot, switching between locations incurs a one-slot travel delay, and arrivals at locations are independent Bernoulli processes with heterogeneous rates. Building on our previous structural result that optimal policies are of exhaustive type, we formulate a discounted-cost Markov decision process and develop an exhaustive-assignment actor-critic policy architecture that enforces exhaustive service by construction and learns only the next-queue allocation for idle robots. Unlike the exhaustive-serve-longest (ESL) queue rule, whose optimality is known only under symmetry, the proposed policy adapts to asymmetry in arrival rates. Across different server-location ratios, loads, and asymmetric arrival profiles, the proposed policy consistently achieves lower discounted holding cost and smaller mean queue length than the ESL baseline, while remaining near-optimal on instances where an optimal benchmark is available. These results show that structure-aware actor-critic methods provide an effective approach for real-time multi-robot scheduling.

Problem

Research questions and friction points this paper is trying to address.

multi-robot

multi-queue

task allocation

asymmetric arrivals

switching delays

Innovation

Methods, ideas, or system contributions that make the work stand out.

exhaustive assignment

actor-critic learning

multi-robot scheduling