🤖 AI Summary
This paper addresses the sensor scheduling problem in graph-structured intrusion detection, formulated as a two-player zero-sum game between defender and attacker. The objective is to minimize both false-negative probability and scheduling cost while accounting for the attacker’s optimal path selection and the defender’s uncertainty regarding sensor models and payoff matrices. We propose a distributed weighted majority algorithm to compute Nash equilibria and introduce, for the first time, a feedback-driven online learning framework that enables asymptotically optimal policy estimation under unknown sensor models. We provide theoretical guarantees on its convergence and establish a high-probability order-optimal regret bound. Simulation results demonstrate that our approach significantly outperforms baseline methods in both known- and unknown-payoff settings, achieving over a threefold improvement in scheduling efficiency.
📝 Abstract
We study the problem of sensor scheduling for an intrusion detection task. We model this as a two-player zero-sum game over a graph, where the defender (Player 1) seeks to identify the optimal strategy for scheduling sensor orientations to minimize the probability of missed detection at minimal cost, while the intruder (Player 2) aims to identify the optimal path selection strategy to maximize missed detection probability at minimal cost. The defender's strategy space grows exponentially with the number of sensors, making direct computation of the Nash Equilibrium (NE) strategies computationally expensive. To tackle this, we propose a distributed variant of the Weighted Majority algorithm that exploits the structure of the game's payoff matrix, enabling efficient computation of the NE strategies with provable convergence guarantees. Next, we consider a more challenging scenario where the defender lacks knowledge of the true sensor models and, consequently, the game's payoff matrix. For this setting, we develop online learning algorithms that leverage bandit feedback from sensors to estimate the NE strategies. By building on existing results from perturbation theory and online learning in matrix games, we derive high-probability order-optimal regret bounds for our algorithms. Finally, through simulations, we demonstrate the empirical performance of our proposed algorithms in both known and unknown payoff scenarios.