🤖 AI Summary
This paper studies the linear contextual bandit problem with partially observable features, where unobserved latent variables induce bias in reward estimation and lead to linear regret growth. To address this challenge, we propose the first prior-free framework that synergistically combines orthogonal basis expansion and doubly robust estimation: an adaptive orthogonal basis is constructed to explicitly model the latent feature subspace corresponding to unknown dimensions, while doubly robust estimation is integrated with linear function approximation to mitigate spurious correlations. We theoretically establish a regret bound of $ ilde{O}(sqrt{(d + d_h)T})$, where $d$ is the dimension of observed features and $d_h$ is the dimension of the latent subspace—significantly improving upon standard linear and non-contextual bandit algorithms. Extensive experiments demonstrate the method’s superior accuracy in reward estimation and decision-making performance.
📝 Abstract
We introduce a novel linear bandit problem with partially observable features, resulting in partial reward information and spurious estimates. Without proper address for latent part, regret possibly grows linearly in decision horizon $T$, as their influence on rewards are unknown. To tackle this, we propose a novel analysis to handle the latent features and an algorithm that achieves sublinear regret. The core of our algorithm involves (i) augmenting basis vectors orthogonal to the observed feature space, and (ii) introducing an efficient doubly robust estimator. Our approach achieves a regret bound of $ ilde{O}(sqrt{(d + d_h)T})$, where $d$ is the dimension of observed features, and $d_h$ is the unknown dimension of the subspace of the unobserved features. Notably, our algorithm requires no prior knowledge of the unobserved feature space, which may expand as more features become hidden. Numerical experiments confirm that our algorithm outperforms both non-contextual multi-armed bandits and linear bandit algorithms depending solely on observed features.