Partition Tree Weighting for Non-Stationary Stochastic Bandits

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses control policy modeling for nonstationary stochastic Bernoulli multi-armed bandits, proposing an active learning framework grounded in universal source coding to mitigate “self-deception” bias arising from action-observation confounding. The core method extends Partition Tree Weighting (PTW)—previously confined to passive prediction—to active control, constructing an action-aware, nonstationarity-robust online coding distribution that directly induces control policies. It comprises a hierarchical partition tree structure, action-conditioned sequence modeling, and a dynamic weight update mechanism. Theoretically, the approach achieves a sublinear dynamic regret bound. Empirically, it significantly outperforms mainstream algorithms—including UCB, Exp3, and AdSwitch—under both abrupt and gradual nonstationarity. Key contributions include: (i) the first adaptation of PTW to active sequential decision-making; (ii) a principled coding-theoretic formulation of adaptive control under nonstationarity; and (iii) provably efficient and empirically superior performance in challenging nonstationary bandit settings.

Technology Category

Application Category

📝 Abstract
This paper considers a generalisation of universal source coding for interaction data, namely data streams that have actions interleaved with observations. Our goal will be to construct a coding distribution that is both universal emph{and} can be used as a control policy. Allowing for action generation needs careful treatment, as naive approaches which do not distinguish between actions and observations run into the self-delusion problem in universal settings. We showcase our perspective in the context of the challenging non-stationary stochastic Bernoulli bandit problem. Our main contribution is an efficient and high performing algorithm for this problem that generalises the Partition Tree Weighting universal source coding technique for passive prediction to the control setting.
Problem

Research questions and friction points this paper is trying to address.

Non-stationary stochastic bandit problem
Universal source coding generalization
Action-observation interaction data handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizes Partition Tree Weighting
Handles non-stationary stochastic bandits
Avoids self-delusion in universal settings
🔎 Similar Papers
No similar papers found.