Refined PAC-Bayes Bounds for Offline Bandits

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work improves PAC-Bayesian bounds for off-policy evaluation in offline bandits, focusing on empirical reward estimation. Building upon the framework of Seldin et al. (2010), we incorporate Rodríguez et al.’s (2024) novel event-space discretization and parameter optimization technique to derive two parameter-free PAC-Bayes bounds. Our method integrates Hoeffding–Azuma and Bernstein inequalities, enabling data-dependent optimal convergence rates “in probability” without requiring post-hoc access to observed data—a first in this setting. Theoretically, our bounds are strictly tighter than existing ones, achieving the same convergence rate as that attained under ideal posterior parameter selection. This yields significantly stronger theoretical guarantees and enhanced practical utility for offline policy evaluation.

Technology Category

Application Category

📝 Abstract

In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodr'iguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the"in probability"parameter. We provide two parameter-free PAC-Bayes bounds, one based on Hoeffding-Azuma's inequality and the other based on Bernstein's inequality. We prove that our bounds are almost optimal as they recover the same rate as would be obtained by setting the"in probability"parameter after the realization of the data.

Problem

Research questions and friction points this paper is trying to address.

Refine probabilistic bounds for off-policy learning.

Optimize parameters using event space discretization.

Provide parameter-free PAC-Bayes bounds for bandits.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Refined PAC-Bayes bounds

Parameter optimization approach

Discretization of event space

🔎 Similar Papers

Leveraging (Biased) Information: Multi-armed Bandits with Offline Data