🤖 AI Summary
Reinforcement learning (RL) poses significant privacy risks when leveraging personal data, necessitating differentially private (DP) algorithms that preserve utility without compromising theoretical guarantees.
Method: We propose DP-TRPO, the first differentially private trust-region policy optimization algorithm with rigorous theoretical foundations and practical efficacy. It integrates Gaussian noise injection directly into the TRPO framework while preserving the convergence properties of the original policy gradient updates. Crucially, we establish a theoretical equivalence between privacy-induced perturbation and the trust-region constraint, circumventing the performance degradation typical of conventional DP-RL approaches.
Results: Empirical evaluation on standard RL benchmarks—including MuJoCo—demonstrates that DP-TRPO substantially outperforms existing DP-RL methods. It maintains high policy performance even under tight privacy budgets (ε = 1–10), scales to more complex tasks, and establishes a robust new paradigm for online RL in privacy-sensitive applications.
📝 Abstract
Motivated by the increasing deployment of reinforcement learning in the real world, involving a large consumption of personal data, we introduce a differentially private (DP) policy gradient algorithm. We show that, in this setting, the introduction of Differential Privacy can be reduced to the computation of appropriate trust regions, thus avoiding the sacrifice of theoretical properties of the DP-less methods. Therefore, we show that it is possible to find the right trade-off between privacy noise and trust-region size to obtain a performant differentially private policy gradient algorithm. We then outline its performance empirically on various benchmarks. Our results and the complexity of the tasks addressed represent a significant improvement over existing DP algorithms in online RL.