New Rates in Stochastic Decision-Theoretic Online Learning under Differential Privacy

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the instance-dependent cumulative regret minimization problem for stochastic multi-armed bandits (K arms, T rounds) under ε-differential privacy (ε-DP). To overcome the ubiquitous log T factor in existing upper bounds, we introduce Bernoulli resampling to enforce monotonicity in the Report-Noisy-Max mechanism and replace Laplace noise with Gumbel noise. Under deterministic loss modeling, we conduct a tight integral analysis. Our approach yields the first upper bound fully independent of T: O(log K/Δₘᵢₙ + log²K/ε). Furthermore, in the deterministic setting, we establish matching Θ(log K/ε) upper and lower bounds—tight up to constant factors—thereby resolving the optimal instance-dependent regret rate under ε-DP. This result strictly improves upon all prior bounds containing log T terms and provides the first complete characterization of the fundamental privacy-regret trade-off for stochastic bandits in the ε-DP regime.

Technology Category

Application Category

📝 Abstract
Hu and Mehta (2024) posed an open problem: what is the optimal instance-dependent rate for the stochastic decision-theoretic online learning (with $K$ actions and $T$ rounds) under $varepsilon$-differential privacy? Before, the best known upper bound and lower bound are $Oleft(frac{log K}{Delta_{min}} + frac{log Klog T}{varepsilon} ight)$ and $Omegaleft(frac{log K}{Delta_{min}} + frac{log K}{varepsilon} ight)$ (where $Delta_{min}$ is the gap between the optimal and the second actions). In this paper, we partially address this open problem by having two new results. First, we provide an improved upper bound for this problem $Oleft(frac{log K}{Delta_{min}} + frac{log^2K}{varepsilon} ight)$, where the $T$-dependency has been removed. Second, we introduce the deterministic setting, a weaker setting of this open problem, where the received loss vector is deterministic and we can focus on the analysis for $varepsilon$ regardless of the sampling error. At the deterministic setting, we prove upper and lower bounds that match at $Thetaleft(frac{log K}{varepsilon} ight)$, while a direct application of the analysis and algorithms from the original setting still leads to an extra log factor. Technically, we introduce the Bernoulli resampling trick, which enforces a monotonic property for the output from report-noisy-max mechanism that enables a tighter analysis. Moreover, by replacing the Laplace noise with Gumbel noise, we derived explicit integral form that gives a tight characterization of the regret in the deterministic case.
Problem

Research questions and friction points this paper is trying to address.

Optimal instance-dependent rate in stochastic online learning
Improved upper bound under differential privacy
Deterministic setting with matching bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved upper bound removes T-dependency.
Deterministic setting simplifies analysis.
Bernoulli resampling trick enhances monotonicity.
🔎 Similar Papers
No similar papers found.