🤖 AI Summary
This paper studies online decision-making in high-dimensional sparse linear contextual bandits, unifying regret minimization and statistical inference. We establish, for the first time, a fundamental theoretical trade-off between these two objectives under high-dimensional sparse covariates. Under a covariate diversity condition, we propose a novel paradigm that achieves both $O(log T)$ regret and $sqrt{T}$-consistent parameter estimation—without explicit exploration. Our method employs lightweight inference via average-weighted debiasing and sample-mean-based estimators. Theoretically, we prove that, in general settings, our approach attains either $O(sqrt{T})$ regret or $sqrt{T}$-consistent inference; under diversity, both guarantees hold simultaneously. Extensive experiments on the Warfarin dosage dataset and multiple synthetic benchmarks validate the efficacy and superiority of our method over existing approaches.
📝 Abstract
This paper investigates regret minimization, statistical inference, and their interplay in high-dimensional online decision-making based on the sparse linear context bandit model. We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters and introduce an inference framework based on a debiasing method using inverse propensity weighting. Under a margin condition, our method achieves either $O(T^{1/2})$ regret or classical $O(T^{1/2})$-consistent inference, indicating an unavoidable trade-off between exploration and exploitation. If a diverse covariate condition holds, we demonstrate that a pure-greedy bandit algorithm, i.e., exploration-free, combined with a debiased estimator based on average weighting can simultaneously achieve optimal $O(log T)$ regret and $O(T^{1/2})$-consistent inference. We also show that a simple sample mean estimator can provide valid inference for the optimal policy's value. Numerical simulations and experiments on Warfarin dosing data validate the effectiveness of our methods.