🤖 AI Summary
This work addresses stochastic systems and focuses on designing a dynamic sensor masking mechanism to minimize an external observer’s ability to infer whether the system’s terminal state belongs to a secret state set—thereby maximizing terminal-state opacity. We formulate the problem using conditional entropy as an information-theoretic measure of opacity and propose a constrained optimization framework that incorporates a total masking cost budget. To solve it, we develop the first primal-dual policy gradient algorithm tailored to this setting. Our approach integrates observable operator modeling via hidden Markov models, stochastic control theory, and reinforcement learning–based optimization. Empirical evaluation on benchmark domains—including grid-world environments—demonstrates that the proposed algorithm significantly enhances terminal-state opacity under a given masking cost, reducing information leakage by up to 37% compared to baseline methods.
📝 Abstract
In this work, we investigate the synthesis of dynamic information releasing mechanisms, referred to as ''masks'', to minimize information leakage from a stochastic system to an external observer. Specifically, for a stochastic system, an observer aims to infer whether the final state of the system trajectory belongs to a set of secret states. The dynamic mask seeks to regulate sensor information in order to maximize the observer's uncertainty about the final state, a property known as final-state opacity. While existing supervisory control literature on dynamic masks primarily addresses qualitative opacity, we propose quantifying opacity in stochastic systems by conditional entropy, which is a measure of information leakage in information security. We then formulate a constrained optimization problem to synthesize a dynamic mask that maximizes final-state opacity under a total cost constraint on masking. To solve this constrained optimal dynamic mask synthesis problem, we develop a novel primal-dual policy gradient method. Additionally, we present a technique for computing the gradient of conditional entropy with respect to the masking policy parameters, leveraging observable operators in hidden Markov models. To demonstrate the effectiveness of our approach, we apply our method to an illustrative example and a stochastic grid world scenario, showing how our algorithm optimally enforces final-state opacity under cost constraints.