Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the problem of best-arm identification in generalized linear bandits with heterogeneous feedback, where at each round the learner may choose either an absolute reward observation from a single arm or a relative pairwise comparison between two arms, both governed by a generalized linear model. To handle this mixed feedback setting under a fixed-confidence framework, the authors propose a unified likelihood-ratio-based confidence sequence that explicitly constructs ellipsoidal confidence sets. Building upon this, they design an adaptive Track-and-Stop algorithm that incorporates minimax optimal experimental design and supports cost-aware query allocation between absolute and relative observations. The algorithm is proven to be δ-correct, with a high-probability upper bound on its stopping time. Empirical results demonstrate substantially improved sample efficiency compared to existing baseline methods.
📝 Abstract
We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Best Arm Identification
Generalized Linear Bandits
Hybrid Feedback
Fixed-Confidence
Sample Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Linear Bandits
Best Arm Identification
Hybrid Feedback
Likelihood Ratio Confidence Sequence
Cost-Aware Learning
🔎 Similar Papers
2024-06-05Neural Information Processing SystemsCitations: 1
2024-07-24arXiv.orgCitations: 4