Laplacian Kernelized Bandit

📅 2026-01-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenge of multi-user contextual bandits where users exhibit graph-structured relationships and the reward function is both nonlinear and graph-homophilous. The authors propose a unified learning framework based on joint regularization, introducing a novel regularizer that combines graph smoothness and individual roughness penalties. They establish, for the first time, its equivalence to a norm in a single multi-user reproducing kernel Hilbert space (RKHS) and explicitly construct a composite kernel that integrates the graph Laplacian with the base arm kernel. Building on this, they develop two efficient exploration algorithms, LK-GP-UCB and LK-GP-TS. Theoretical analysis yields high-probability regret bounds dependent only on the effective dimension of the multi-user kernel, eliminating dependence on the number of users or ambient dimensionality. Experiments demonstrate significant superiority over existing baselines in nonlinear settings while maintaining competitive performance in linear cases.

Technology Category

Application Category

📝 Abstract

We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint penalty for the collection of user reward functions $\{f_u\}$, combining a graph smoothness term based on RKHS distances with an individual roughness penalty. Our central contribution is proving that this penalty is equivalent to the squared norm within a single, unified \emph{multi-user RKHS}. We explicitly derive its reproducing kernel, which elegantly fuses the graph Laplacian with the base arm kernel. This unification allows us to reframe the problem as learning a single''lifted''function, enabling the design of principled algorithms, \texttt{LK-GP-UCB} and \texttt{LK-GP-TS}, that leverage Gaussian Process posteriors over this new kernel for exploration. We provide high-probability regret bounds that scale with an \emph{effective dimension} of the multi-user kernel, replacing dependencies on user count or ambient dimension. Empirically, our methods outperform strong linear and non-graph-aware baselines in non-linear settings and remain competitive even when the true rewards are linear. Our work delivers a unified, theoretically grounded, and practical framework that bridges Laplacian regularization with kernelized bandits for structured exploration.

Problem

Research questions and friction points this paper is trying to address.

multi-user contextual bandits

graph homophily

non-linear rewards

Laplacian regularization

structured exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Laplacian regularization

kernelized bandits

multi-user RKHS