🤖 AI Summary
This work addresses the problem of best-arm identification in generalized linear bandits with heterogeneous feedback, where at each round the learner may choose either an absolute reward observation from a single arm or a relative pairwise comparison between two arms, both governed by a generalized linear model. To handle this mixed feedback setting under a fixed-confidence framework, the authors propose a unified likelihood-ratio-based confidence sequence that explicitly constructs ellipsoidal confidence sets. Building upon this, they design an adaptive Track-and-Stop algorithm that incorporates minimax optimal experimental design and supports cost-aware query allocation between absolute and relative observations. The algorithm is proven to be δ-correct, with a high-probability upper bound on its stopping time. Empirical results demonstrate substantially improved sample efficiency compared to existing baseline methods.
📝 Abstract
We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.