Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

πŸ“… 2026-05-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

184K/year
πŸ€– AI Summary
This work addresses the challenge of modeling nonlinear, path-dependent rewards in contextual bandits by introducing signature transformsβ€”a novel approach that maps raw trajectories into a high-dimensional signature space, enabling complex reward functionals to be accurately approximated linearly. Building on this representation, the authors propose DisSigUCB, a new algorithm that integrates the linear contextual bandit framework with an upper confidence bound (UCB) strategy. Through a high-probability, data-dependent analysis, they establish a sublinear regret bound of Γ•(√((d+m)KT)), where d and m denote the dimensions of the context and signature features, respectively, K is the number of arms, and T is the time horizon. Empirical evaluations demonstrate that DisSigUCB significantly outperforms classical linear and kernelized baselines across multiple real-world scenarios, effectively capturing and optimizing path-dependent reward structures.
πŸ“ Abstract
We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order \(\tilde{\mathcal O}(\sqrt{(d+m)KT})\) where \(d\) is the context dimension and \(m\) is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage classification, and hospital nurse staffing demonstrate that \texttt{DisSigUCB} consistently outperforms classical linear and kernelized contextual bandit baselines in nonlinear and path-dependent settings.
Problem

Research questions and friction points this paper is trying to address.

contextual bandits
nonlinear rewards
path-dependent rewards
sequential decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

signature transform
contextual bandits
path-dependent rewards
nonlinear rewards
DisSigUCB
πŸ”Ž Similar Papers
2024-02-27IEEE Transactions on Information TheoryCitations: 1
2024-07-24arXiv.orgCitations: 4