🤖 AI Summary
This work addresses the challenge of applying traditional additive-noise differential privacy mechanisms to non-numeric symbolic trajectories—such as those generated by Markov chains or finite-state automata—which often reside in discrete, combinatorial spaces. The paper presents the first efficient adaptation of the permute-and-flip mechanism to symbolic sequences over a finite alphabet, circumventing the need for exponential enumeration of candidate trajectories. The proposed method provides theoretical utility guarantees comparable to state-of-the-art approaches while significantly enhancing practicality through a combination of symbolic system modeling and stochastic approximate sampling. Empirical evaluation on real-world traffic datasets demonstrates that, under standard privacy budgets, the approach reduces trajectory error by up to 55% compared to existing methods.
📝 Abstract
Privacy techniques have been developed for data-driven systems, but systems with non-numeric data cannot use typical noise-adding techniques. Therefore, we develop a new mechanism for privatizing state trajectories of symbolic systems that may be represented as words over a finite alphabet. Such systems include Markov chains, Markov decision processes, and finite-state automata, and we protect their symbolic trajectories with differential privacy. The mechanism we develop randomly selects a private approximation to be released in place of the original sensitive word, with a bias towards low-error private words. This work is based on the permute-and-flip mechanism for differential privacy, which can be applied to non-numeric data. However, a naïve implementation would have to enumerate an exponentially large list of words to generate a private word. As a result, we develop a new mechanism that generates private words without ever needing to enumerate such a list. We prove that the accuracy of our mechanism is never worse than the prior state of the art, and we empirically show on a real traffic dataset that it introduces up to $55\%$ less error than the prior state of the art under a conventional privacy implementation.