Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the challenge of modeling nonlinear, path-dependent rewards in contextual bandits by introducing signature transforms—a novel approach that maps raw trajectories into a high-dimensional signature space, enabling complex reward functionals to be accurately approximated linearly. Building on this representation, the authors propose DisSigUCB, a new algorithm that integrates the linear contextual bandit framework with an upper confidence bound (UCB) strategy. Through a high-probability, data-dependent analysis, they establish a sublinear regret bound of Õ(√((d+m)KT)), where d and m denote the dimensions of the context and signature features, respectively, K is the number of arms, and T is the time horizon. Empirical evaluations demonstrate that DisSigUCB significantly outperforms classical linear and kernelized baselines across multiple real-world scenarios, effectively capturing and optimizing path-dependent reward structures.

📝 Abstract

We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order $\tilde{\mathcal O}(\sqrt{(d+m)KT})$ where $d$ is the context dimension and $m$ is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage classification, and hospital nurse staffing demonstrate that \texttt{DisSigUCB} consistently outperforms classical linear and kernelized contextual bandit baselines in nonlinear and path-dependent settings.

Problem

Research questions and friction points this paper is trying to address.

contextual bandits

nonlinear rewards

path-dependent rewards

sequential decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

signature transform

contextual bandits

path-dependent rewards

nonlinear rewards