Best Arm Identification in Generalized Linear Bandits via Hybrid Feedback

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work addresses the problem of best-arm identification in generalized linear bandits with heterogeneous feedback, where at each round the learner may choose either an absolute reward observation from a single arm or a relative pairwise comparison between two arms, both governed by a generalized linear model. To handle this mixed feedback setting under a fixed-confidence framework, the authors propose a unified likelihood-ratio-based confidence sequence that explicitly constructs ellipsoidal confidence sets. Building upon this, they design an adaptive Track-and-Stop algorithm that incorporates minimax optimal experimental design and supports cost-aware query allocation between absolute and relative observations. The algorithm is proven to be δ-correct, with a high-probability upper bound on its stopping time. Empirical results demonstrate substantially improved sample efficiency compared to existing baseline methods.

📝 Abstract

We study fixed-confidence best arm identification in generalized linear bandits under a hybrid feedback model: at each round, the learner may query either (i) absolute reward feedback from a single arm or (ii) relative (dueling) feedback from an arm pair, both governed by generalized linear models. We introduce a likelihood-ratio--based confidence sequence that unifies heterogeneous generalized linear observations and yields an explicit ellipsoidal confidence set under a self-concordance assumption. Building on this confidence set, we propose a hybrid Track-and-Stop algorithm that adaptively allocates queries by tracking a minimax-optimal design over a joint action space of arms and pairs. We establish $δ$-correctness and provide high-probability upper bounds on the stopping time. We further extend the framework to a cost-aware setting that accounts for heterogeneous acquisition costs across feedback modalities. Empirical experiments demonstrate that the proposed algorithms significantly improve sample efficiency over baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Best Arm Identification

Generalized Linear Bandits

Hybrid Feedback

Fixed-Confidence

Sample Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Linear Bandits

Best Arm Identification

Hybrid Feedback