🤖 AI Summary
This paper studies differentially private adversarial multi-armed bandits (MAB) and expert-augmented MAB under differential privacy. We propose the first general privatization framework, unifying central and local privacy settings via a perturbation-calibration-scaling technique. Theoretically, we establish the first rigorous separation between central and local privacy in adversarial MAB. Algorithmically, we design the first differentially private algorithm for the expert-augmented setting. Under central privacy, our adversarial MAB algorithm achieves regret $O(sqrt{KT}/sqrt{varepsilon})$, improving upon the prior best $O(sqrt{KT log(KT)}/varepsilon)$. For expert-augmented MAB, we derive three novel sublinear regret bounds—covering distinct regimes of $K$ (number of arms), $N$ (number of experts), and $varepsilon$ (privacy budget)—thereby breaking the linear-regret barrier previously inherent in local privacy.
📝 Abstract
We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithm to a private bandit algorithm. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of $Oleft(frac{sqrt{KT}}{sqrt{epsilon}}
ight)$, improving upon the existing upper bound $Oleft(frac{sqrt{KT log(KT)}}{epsilon}
ight)$ for all $epsilon leq 1$. In particular, our algorithms allow for sublinear expected regret even when $epsilon leq frac{1}{sqrt{T}}$, establishing the first known separation between central and local differential privacy for this problem. For bandits with expert advice, we give the first differentially private algorithms, with expected regret $Oleft(frac{sqrt{NT}}{sqrt{epsilon}}
ight), Oleft(frac{sqrt{KTlog(N)}log(KT)}{epsilon}
ight)$, and $ ilde{O}left(frac{N^{1/6}K^{1/2}T^{2/3}log(NT)}{epsilon ^{1/3}} + frac{N^{1/2}log(NT)}{epsilon}
ight)$, where $K$ and $N$ are the number of actions and experts respectively. These rates allow us to get sublinear regret for different combinations of small and large $K, N$ and $epsilon.$