Contextual Continuum Bandits: Static Versus Dynamic Regret

📅 2024-06-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper studies dynamic regret minimization in contextual continuum-armed bandits: at each round, an agent selects a decision from a convex action set based on a context, aiming to online optimize a context-dependent objective function. Methodologically, we propose an interior-point algorithm leveraging self-concordant barrier functions, operating under noisy observations. Our contributions are threefold: (i) We establish the first theoretical framework for context-dependent dynamic regret, overcoming the limitations of conventional static regret analysis; (ii) Under Hölder continuity assumptions, we prove that static regret bounds are transferable to the dynamic setting, and our algorithm achieves sublinear dynamic regret; for strongly convex and smooth objectives, it attains the minimax optimal rate up to logarithmic factors; (iii) We rigorously show that sublinear dynamic regret is unattainable when the context–function mapping lacks continuity.

Technology Category

Application Category

📝 Abstract

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated to the context. The goal is to minimize all the underlying functions for the received contexts, leading to a dynamic (contextual) notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are H""older with respect to the contexts, we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear dynamic regret. We further study the case of strongly convex and smooth functions when the observations are noisy. Inspired by the interior point method and employing self-concordant barriers, we propose an algorithm achieving a sub-linear dynamic regret. Lastly, we present a minimax lower bound, implying two key facts. First, no algorithm can achieve sub-linear dynamic regret over functions that are not continuous with respect to the context. Second, for strongly convex and smooth functions, the algorithm that we propose achieves, up to a logarithmic factor, the minimax optimal rate of dynamic regret as a function of the number of queries.

Problem

Research questions and friction points this paper is trying to address.

Minimizing dynamic regret in contextual continuum bandits.

Extending static regret algorithms to achieve sub-linear dynamic regret.

Proposing an algorithm for sub-linear dynamic regret with noisy observations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends sub-linear static regret to dynamic regret.

Uses self-concordant barriers for algorithm design.

Achieves minimax optimal dynamic regret rate.

🔎 Similar Papers

Batched Nonparametric Contextual Bandits