🤖 AI Summary
This paper investigates the triple trade-off among global safety constraints, cumulative regret minimization, and local differential privacy (LDP) in multi-agent linear stochastic bandits. We introduce *safety-set geometric sharpness*—a novel, unified metric that quantitatively characterizes the interplay among safety, privacy, and regret—and define Pareto-optimal LDP privacy levels that cannot be unilaterally improved under a given regret budget. Our method integrates LDP mechanisms, safety-constrained optimization, and geometric analysis to derive tight, quantitative trade-off bounds. We prove that the proposed adaptive privacy allocation strategy achieves an $mathcal{O}(sqrt{T})$ safe regret bound on standard safety sets—substantially improving upon existing baselines that jointly enforce safety and privacy.
📝 Abstract
We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget.