Faster Rates for Private Adversarial Bandits

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This paper studies differentially private adversarial multi-armed bandits (MAB) and expert-augmented MAB under differential privacy. We propose the first general privatization framework, unifying central and local privacy settings via a perturbation-calibration-scaling technique. Theoretically, we establish the first rigorous separation between central and local privacy in adversarial MAB. Algorithmically, we design the first differentially private algorithm for the expert-augmented setting. Under central privacy, our adversarial MAB algorithm achieves regret $O(sqrt{KT}/sqrt{varepsilon})$, improving upon the prior best $O(sqrt{KT log(KT)}/varepsilon)$. For expert-augmented MAB, we derive three novel sublinear regret bounds—covering distinct regimes of $K$ (number of arms), $N$ (number of experts), and $varepsilon$ (privacy budget)—thereby breaking the linear-regret barrier previously inherent in local privacy.

Technology Category

Application Category

📝 Abstract

We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithm to a private bandit algorithm. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of $Oleft(frac{sqrt{KT}}{sqrt{epsilon}} ight)$, improving upon the existing upper bound $Oleft(frac{sqrt{KT log(KT)}}{epsilon} ight)$ for all $epsilon leq 1$. In particular, our algorithms allow for sublinear expected regret even when $epsilon leq frac{1}{sqrt{T}}$, establishing the first known separation between central and local differential privacy for this problem. For bandits with expert advice, we give the first differentially private algorithms, with expected regret $Oleft(frac{sqrt{NT}}{sqrt{epsilon}} ight), Oleft(frac{sqrt{KTlog(N)}log(KT)}{epsilon} ight)$, and $ ilde{O}left(frac{N^{1/6}K^{1/2}T^{2/3}log(NT)}{epsilon ^{1/3}} + frac{N^{1/2}log(NT)}{epsilon} ight)$, where $K$ and $N$ are the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large $K, N$ and $epsilon.$

Problem

Research questions and friction points this paper is trying to address.

Design private algorithms for adversarial bandits

Improve regret bounds for private bandit algorithms

Establish privacy-regret trade-offs for expert advice

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts non-private bandit algorithms to private

Improves regret bound to O(sqrt(KT)/sqrt(ε))

First private algorithms for expert advice bandits

🔎 Similar Papers

A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach