The Safety-Privacy Tradeoff in Linear Bandits

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper investigates the triple trade-off among global safety constraints, cumulative regret minimization, and local differential privacy (LDP) in multi-agent linear stochastic bandits. We introduce *safety-set geometric sharpness*—a novel, unified metric that quantitatively characterizes the interplay among safety, privacy, and regret—and define Pareto-optimal LDP privacy levels that cannot be unilaterally improved under a given regret budget. Our method integrates LDP mechanisms, safety-constrained optimization, and geometric analysis to derive tight, quantitative trade-off bounds. We prove that the proposed adaptive privacy allocation strategy achieves an $mathcal{O}(sqrt{T})$ safe regret bound on standard safety sets—substantially improving upon existing baselines that jointly enforce safety and privacy.

Technology Category

Application Category

📝 Abstract

We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget.

Problem

Research questions and friction points this paper is trying to address.

Balancing safety constraints and regret minimization in linear bandits

Ensuring agent privacy via local differential privacy (LDP)

Optimizing privacy levels given a maximum regret budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear bandits with global safety constraints

Local differential privacy for data sharing

Unilaterally unimprovable privacy levels vector

🔎 Similar Papers

Identifiable latent bandits: Combining observational data and exploration for personalized healthcare