Differential Privacy in the Extensive-Form Bandit Problem

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

This study addresses the problem of locally differentially private online learning in extensive-form bandits, where at each round a user observes only the rewards and information set they encounter while interacting with a memoryless adversary, all coordinated by a central server. We introduce local differential privacy into this setting for the first time and propose the first privacy-preserving algorithm with theoretical guarantees. Our approach combines strategy space compression with carefully calibrated privacy-preserving noise injection, enabling efficient learning under strict privacy constraints. The algorithm achieves a regret bound of Õ(√(A ln(S) T)/ε), where A denotes the maximum number of actions per information set, S the number of information sets, T the time horizon, and ε the privacy parameter. Its per-round computational complexity is nearly linear in the time required for the server to transmit the compressed strategy.

📝 Abstract

We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of the learner's possible reduced strategies, and $T$ is the number of trials. On each trial, the time complexity of our algorithm is, up to a factor logarithmic in the maximum number of actions at an infoset, equal to the time required for the server to transmit the reduced strategy to the user. We note that local differential privacy is the strongest version of differential privacy and, to the best of our knowledge, this is the first work to study differential privacy of any form in the extensive-form bandit problem.

Problem

Research questions and friction points this paper is trying to address.

extensive-form bandit

differential privacy

local differential privacy

regret minimization

oblivious adversary

Innovation

Methods, ideas, or system contributions that make the work stand out.

local differential privacy

extensive-form bandit

regret bound