Thompson Sampling for Multi-Objective Linear Contextual Bandit

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the multi-objective linear contextual bandit problem, where multiple potentially conflicting objectives must be simultaneously optimized. To this end, we propose MOL-TS—the first Thompson sampling-based algorithm achieving Pareto optimality—introducing the notion of an “effective Pareto frontier” to enable joint parameter sampling across objectives and eliminate redundant arm selections. MOL-TS integrates high-dimensional linear feature modeling, online learning, and dynamic estimation of the Pareto-optimal solution set to ensure efficient arm selection. We establish a Pareto regret bound of $widetilde{O}(d^{3/2}sqrt{T})$, the first rigorous regret guarantee for any Thompson sampling method in the multi-objective linear bandit setting. Empirical evaluations demonstrate that MOL-TS significantly outperforms existing baselines in both Pareto regret control and balanced multi-objective performance.

Technology Category

Application Category

📝 Abstract

We study the multi-objective linear contextual bandit problem, where multiple possible conflicting objectives must be optimized simultaneously. We propose exttt{MOL-TS}, the extit{first} Thompson Sampling algorithm with Pareto regret guarantees for this problem. Unlike standard approaches that compute an empirical Pareto front each round, exttt{MOL-TS} samples parameters across objectives and efficiently selects an arm from a novel emph{effective Pareto front}, which accounts for repeated selections over time. Our analysis shows that exttt{MOL-TS} achieves a worst-case Pareto regret bound of $widetilde{O}(d^{3/2}sqrt{T})$, where $d$ is the dimension of the feature vectors, $T$ is the total number of rounds, matching the best known order for randomized linear bandit algorithms for single objective. Empirical results confirm the benefits of our proposed approach, demonstrating improved regret minimization and strong multi-objective performance.

Problem

Research questions and friction points this paper is trying to address.

Optimizes conflicting objectives in multi-objective linear contextual bandit

Proposes Thompson Sampling algorithm with Pareto regret guarantees

Achieves efficient regret bound matching single-objective best known order

Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson Sampling for multi-objective linear contextual bandit

Efficient arm selection from novel effective Pareto front

Achieves Pareto regret bound matching single-objective best order

🔎 Similar Papers

No similar papers found.

Authors to Follow