🤖 AI Summary
This work addresses the problem of minimizing worst-case weighted latency in persistent monitoring tasks performed by multiple robots on weighted graphs. To improve upon conventional performance metrics, the authors introduce a novel class of tail-performance objective functions and, for the first time, formulate the problem as an event-driven average-reward Markov decision process (TWLO-MDP). Theoretical analysis establishes the existence of optimal policies, the near-optimality of periodic solutions, and the reducibility of the proposed model. Leveraging this framework, the approach is integrated with reinforcement learning algorithms and evaluated on the multi-robot monitoring benchmark platform M2Bench. Experimental results demonstrate that the proposed method significantly outperforms existing baselines in both synthetic and real-world scenarios, effectively reducing worst-case weighted latency.
📝 Abstract
We study multi-robot persistent monitoring on weighted graphs, where node weights encode monitoring priorities and edge weights encode travel distances. The goal is to design joint robot trajectories that minimize the worst-case weighted latency across all nodes over an infinite time horizon. The widely adopted worst-case latency objective evaluates team performance over the entire time horizon and therefore may fail to distinguish strategies with poor transient behavior but strong asymptotic performance. To address this limitation, we propose a family of tail-performance objectives that generalize the standard objective and study the resulting functional optimization problems. We establish several key theoretical properties, including the existence of optimal strategies, relationships among the proposed objectives and their corresponding optimization problems, approximation by periodic solutions to arbitrary accuracy, and reductions to event-driven decision models with discretized waiting times. Building on these results, we construct an equivalent event-driven Markov decision process (MDP), called the Tail Worst-case Latency-Optimizing Markov Decision Process (TWLO-MDP), which reformulates the tail-performance objective as a standard average-reward criterion. We then develop reinforcement-learning-based solution methods for the TWLO-MDP and introduce the multi-robot monitoring benchmark (M2Bench), a unified platform that supports the evaluation and comparison of heuristic and learning-based monitoring algorithms. Experiments on synthetic and realistic monitoring scenarios show that our methods effectively reduce the worst-case weighted latency and outperform representative baselines.