🤖 AI Summary
This paper investigates the problem of identifying the arm with the highest variance in multi-armed bandits, addressing both regret minimization and fixed-budget best-arm identification (BAI). We propose the first variance-oriented optimal sampling framework, introducing UCB-VV for regret minimization and SHVV for BAI. We establish their order-optimality: UCB-VV achieves an $O(log n)$ regret upper bound that matches the information-theoretic lower bound; SHVV attains an error probability upper bound of $exp(-n/(log K cdot H))$ under sub-Gaussian assumptions—tight up to logarithmic factors. Key technical contributions include novel concentration inequalities linking sample variance to empirical Sharpe ratios, and extensions of sub-Gaussian analytical tools. Extensive Monte Carlo simulations and empirical evaluation on GBM-based option trading tasks—across six experimental settings and 100 synthetic stock options—demonstrate statistically significant improvements over state-of-the-art baselines.
📝 Abstract
This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) regret setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget ac{BAI} setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called exttt{UCB-VV} for the regret setting and show that its upper bound on regret for bounded rewards evolves as $mathcal{O}left(log{n}
ight)$ where $n$ is the horizon. By deriving the lower bound on the regret, we show that exttt{UCB-VV} is order optimal. For the fixed budget ac{BAI} setting and propose the exttt{SHVV} algorithm. We show that the upper bound of the error probability of exttt{SHVV} evolves as $expleft(-frac{n}{log(K) H}
ight)$, where $H$ represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that exttt{UCB-VV} consistently outperforms exttt{$epsilon$-greedy} across different sub-optimality gaps though it is surpassed by exttt{VTS}, which exhibits the lowest regret, albeit lacking in theoretical guarantees. We also illustrate the superior performance of exttt{SHVV}, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the exttt{UCB-VV} and exttt{SHVV} in call option trading on $100$ stocks generated using ac{GBM}.