Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates

📅 2024-11-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies online decision-making in high-dimensional sparse linear contextual bandits, unifying regret minimization and statistical inference. We establish, for the first time, a fundamental theoretical trade-off between these two objectives under high-dimensional sparse covariates. Under a covariate diversity condition, we propose a novel paradigm that achieves both $O(log T)$ regret and $sqrt{T}$-consistent parameter estimation—without explicit exploration. Our method employs lightweight inference via average-weighted debiasing and sample-mean-based estimators. Theoretically, we prove that, in general settings, our approach attains either $O(sqrt{T})$ regret or $sqrt{T}$-consistent inference; under diversity, both guarantees hold simultaneously. Extensive experiments on the Warfarin dosage dataset and multiple synthetic benchmarks validate the efficacy and superiority of our method over existing approaches.

Technology Category

Application Category

📝 Abstract
This paper investigates regret minimization, statistical inference, and their interplay in high-dimensional online decision-making based on the sparse linear context bandit model. We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters and introduce an inference framework based on a debiasing method using inverse propensity weighting. Under a margin condition, our method achieves either $O(T^{1/2})$ regret or classical $O(T^{1/2})$-consistent inference, indicating an unavoidable trade-off between exploration and exploitation. If a diverse covariate condition holds, we demonstrate that a pure-greedy bandit algorithm, i.e., exploration-free, combined with a debiased estimator based on average weighting can simultaneously achieve optimal $O(log T)$ regret and $O(T^{1/2})$-consistent inference. We also show that a simple sample mean estimator can provide valid inference for the optimal policy's value. Numerical simulations and experiments on Warfarin dosing data validate the effectiveness of our methods.
Problem

Research questions and friction points this paper is trying to address.

Regret minimization in high-dimensional online decision-making
Statistical inference for sparse linear bandit models
Trade-off between exploration and exploitation in bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates ε-greedy algorithm with hard thresholding
Uses debiasing with inverse propensity weighting
Combines pure-greedy algorithm with average weighting
🔎 Similar Papers
No similar papers found.
C
Congyuan Duan
Department of Mathematics, Hong Kong University of Science and Technology
W
Wanteng Ma
Department of Statistics and Data Science, University of Pennsylvania
Jiashuo Jiang
Jiashuo Jiang
Hong Kong University of Science and Technology
operations researchoperations managementoptimizationapproximation algorithmsmachine learning
Dong Xia
Dong Xia
Hong Kong University of Science and Technology
Machine LearningStatisticsOptimizationTensors