Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work investigates the optimal gap-dependent regret rate for full-information stochastic online learning under event-level pure differential privacy constraints. We propose a novel privacy-preserving algorithm that leverages exponentially growing time blocks, random prefixes, and the exponential mechanism. By employing an entropy potential analysis, we decompose the block-wise regret into a sum of softmax selection errors. Our approach achieves, for the first time without any restriction on the time horizon, an upper bound that matches the known theoretical lower bound up to constant factors. Specifically, for any time horizon $ T $, the regret bound is $ O\left(\frac{\log K}{\Delta_{\min}} + \frac{\log K}{\varepsilon}\right) $, resolving an open problem posed at COLT.

📝 Abstract

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

online learning

stochastic bandits

gap-dependent regret

COLT open problem

Innovation

Methods, ideas, or system contributions that make the work stand out.

differential privacy

gap-dependent regret

online learning