🤖 AI Summary
This work addresses the limitations of existing non-asymptotic concentration bounds, which are suboptimal in both constants and rates and fail to handle unbounded data or accurately capture the asymptotic fluctuations of the empirical $\mathrm{KL}_{\inf}$ statistic. By integrating tools from probabilistic limit theory and information geometry, the paper establishes, for the first time, a sharp law of the iterated logarithm for the empirical $\mathrm{KL}_{\inf}$ statistic under extremely general conditions—including unbounded observations—with matching upper and lower bounds. This result overcomes the longstanding trade-off between generality and optimality inherent in classical concentration inequalities and provides a rigorous theoretical foundation for optimal stochastic arm selection algorithms and sequential hypothesis testing.
📝 Abstract
The population $\mathrm{KL}_{\inf}$ is a fundamental quantity that appears in lower bounds for (asymptotically) optimal regret of pure-exploration stochastic bandit algorithms, and optimal stopping time of sequential tests. Motivated by this, an empirical $\mathrm{KL}_{\inf}$ statistic is frequently used in the design of (asymptotically) optimal bandit algorithms and sequential tests. While nonasymptotic concentration bounds for the empirical $\mathrm{KL}_{\inf}$ have been developed, their optimality in terms of constants and rates is questionable, and their generality is limited (usually to bounded observations). The fundamental limits of nonasymptotic concentration are often described by the asymptotic fluctuations of the statistics. With that motivation, this paper presents a tight (upper and lower) law of the iterated logarithm for empirical $\mathrm{KL}_{\inf}$ applying to extremely general (unbounded) data.