Privacy Preserving Reinforcement Learning with One-Sided Feedback

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This work addresses the challenge of multi-dimensional continuous reinforcement learning under one-sided feedback, where the agent observes only partial state information and receives rewards exclusively on a subset of state-action pairs, while simultaneously needing to ensure both learning efficiency and privacy preservation. The paper proposes the POOL algorithm, which for the first time integrates differential privacy into this setting by modeling and optimizing partially observable feedback. The approach achieves strong privacy guarantees without compromising learning efficiency. Theoretical analysis demonstrates that the algorithm’s sample complexity matches the known lower bound in the non-private setting, thereby overcoming the longstanding trade-off between privacy and utility.

📝 Abstract

We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound that matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is the optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.

Problem

Research questions and friction points this paper is trying to address.

Privacy Preserving

Reinforcement Learning

One-Sided Feedback

Continuous State Space

Partial Observations

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving reinforcement learning

one-sided feedback

sample complexity