🤖 AI Summary
This work addresses the Maximal Covering Location-Interdiction Problem (MCLIP), a challenging bilevel robust optimization problem, by proposing an adversarial learning–based dual-agent deep reinforcement learning framework. The approach models worst-case facility disruption scenarios through a strategic game between an upper-level location agent and a lower-level interdiction agent, thereby enhancing coverage robustness. It introduces a dynamic adversarial training mechanism and an interdiction-agent–driven ensemble inference strategy, offering model-free flexibility and strong generalizability to other bilevel optimization problems. Experimental results demonstrate that the proposed method significantly improves computational efficiency on both synthetic and real-world datasets while achieving solution quality comparable to or better than existing approaches.
📝 Abstract
The Maximal Covering Location-Interdiction Problem (MCLIP) is a classic bi-level optimization problem, which is fundamental to resilient infrastructure planning yet remains computationally intractable. Specifically, the upper level determines facility locations to maximize coverage, while the lower level executes worst-case interdiction to minimize the coverage. The strong coupling between the upper and lower levels, combined with their respective high combinatorial complexity, renders traditional methods ineffective. To bridge this gap, we propose a Dual-Agent Deep Reinforcement Learning (DADRL) framework based on adversarial learning, comprising a location agent corresponding to the upper level and an interdiction agent corresponding to the lower level. Our contributions are threefold: (1) The location agent is trained simultaneously against an evolving interdiction agent, making it effectively capture the dynamic competitive interplay between the upper and lower levels; (2) To fully exploit the learned capabilities of the interdiction agent, we propose a Surrogate-based Ensemble Inference Strategy that utilizes the trained interdiction agent as a high-fidelity surrogate to guide the decisions of location agent; (3) Extensive experiments on synthetic and real-world datasets demonstrate that our approach achieves superior computational efficiency while maintaining highly competitive solution quality compared to other baselines. Furthermore, our DADRL framework is model-agnostic to network structures, while its underlying adversarial learning paradigm demonstrates strong potential for solving other bi-level optimization problems.