🤖 AI Summary
This study addresses the coordination challenge in decentralized inspection and maintenance planning for multi-component engineering systems by formulating the problem as a partially observable Markov decision process and employing multi-agent deep reinforcement learning (MADRL) approaches. The work encompasses a spectrum of training paradigms—from fully centralized to fully decentralized—including value decomposition and actor-critic architectures. A novel benchmark environment with tunable redundancy is introduced to systematically evaluate, for the first time, the coordination capabilities and policy optimality of various MADRL algorithms. Experimental results demonstrate that decentralized policies achieve near-optimal performance in low-redundancy series systems; however, coordination complexity increases significantly with higher redundancy. Despite this, all MADRL strategies consistently outperform an optimized heuristic baseline across tested scenarios.
📝 Abstract
Inspection and maintenance (I&M) planning involves sequential decision making under uncertainties and incomplete information, and can be modeled as a partially observable Markov decision process (POMDP). While single-agent deep reinforcement learning provides approximate solutions to POMDPs, it does not scale well in multi-component systems. Scalability can be achieved through multi-agent deep reinforcement learning (MADRL), which decentralizes decision-making across multiple agents, locally controlling individual components. However, this decentralization can induce cooperation pathologies that degrade the optimality of the learned policies. To examine these effects in I&M planning, we introduce a set of deteriorating systems in which redundancy is varied systematically. These benchmark environments are designed such that computation of centralized (near-)optimal policies remains tractable, enabling direct comparison of solution methods. We implement and benchmark a broad set of MADRL algorithms spanning fully centralized and decentralized training paradigms, from value-factorization to actor-critic methods. Our results show a clear effect of redundancy on coordination: MADRL algorithms achieve near-optimal performance in series-like settings, whereas increasing redundancy amplifies coordination challenges and can lead to optimality losses. Nonetheless, decentralized agents learn structured policies that consistently outperform optimized heuristic baselines, highlighting both the promise and current limitations of decentralized learning for scalable maintenance planning.