Differential Privacy in the Extensive-Form Bandit Problem

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This study addresses the problem of locally differentially private online learning in extensive-form bandits, where at each round a user observes only the rewards and information set they encounter while interacting with a memoryless adversary, all coordinated by a central server. We introduce local differential privacy into this setting for the first time and propose the first privacy-preserving algorithm with theoretical guarantees. Our approach combines strategy space compression with carefully calibrated privacy-preserving noise injection, enabling efficient learning under strict privacy constraints. The algorithm achieves a regret bound of Õ(√(A ln(S) T)/ε), where A denotes the maximum number of actions per information set, S the number of information sets, T the time horizon, and ε the privacy parameter. Its per-round computational complexity is nearly linear in the time required for the server to transmit the compressed strategy.
📝 Abstract
We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of the learner's possible reduced strategies, and $T$ is the number of trials. On each trial, the time complexity of our algorithm is, up to a factor logarithmic in the maximum number of actions at an infoset, equal to the time required for the server to transmit the reduced strategy to the user. We note that local differential privacy is the strongest version of differential privacy and, to the best of our knowledge, this is the first work to study differential privacy of any form in the extensive-form bandit problem.
Problem

Research questions and friction points this paper is trying to address.

extensive-form bandit
differential privacy
local differential privacy
regret minimization
oblivious adversary
Innovation

Methods, ideas, or system contributions that make the work stand out.

local differential privacy
extensive-form bandit
regret bound
privacy-preserving learning
game theory
🔎 Similar Papers
No similar papers found.