Geometry-Aware Offline-to-Online Learning in Linear Contextual Bandits

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the parameter mismatch problem arising from offline data bias in offline-to-online learning by proposing the Ellipsoidal-MINUCB algorithm. The method integrates a standard online learning branch with an offline-guided branch, selectively leveraging offline information only when it effectively reduces uncertainty. It introduces a geometry-aware ellipsoidal confidence region to replace the conventional isotropic radius. Key innovations include a regret bound that disentangles statistical width from transfer bias, a dynamic transfer certificate mechanism based on directional transfer modeling, and an exploration strategy combining ridge regression with SupLinUCB-style design. The algorithm enjoys high-probability regret guarantees, and experiments demonstrate its significant superiority over baselines—along with strong safety and efficacy—within moderate time horizons where offline coverage aligns well with transferability.

Technology Category

Application Category

📝 Abstract

We study offline-to-online learning in linear contextual bandits with biased offline regression data: the offline parameter need not match the online one, so history should not be treated as a single warm start. We model directional transfer with a shift certificate $(M_{\mathrm{shift}},ρ)$ and offline ridge estimation, yielding a geometry-aware confidence region for the online parameter rather than an isotropic radius. We propose \emph{Ellipsoidal-MINUCB}, which combines a standard online branch with an offline-informed pooled branch and uses offline information only when it tightens uncertainty. With high probability, regret is bounded by the minimum of a standard SupLinUCB-style fallback and a pooled term that separates statistical width from a certificate-weighted shift penalty. Under a simple alignment condition, the pooled term further simplifies to a rate governed by an effective dimension induced by the offline geometry. We also show that a purely Euclidean (scalar) shift bound, by itself, does not determine which feature directions are transferable. Beyond this fixed certificate, we show how to learn a data-driven certificate from data at finitely many refresh times and establish a high-probability regret bound for Ellipsoidal-MINUCB with epoch-wise learned certificates. Experiments match the main prediction: gains are strongest at intermediate horizons when offline coverage and transferability align, while the method otherwise tracks the safe online baseline.

Problem

Research questions and friction points this paper is trying to address.

offline-to-online learning

linear contextual bandits

biased offline data

parameter shift

transferability

Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-aware learning

offline-to-online transfer

shift certificate