Bandit Pareto Set Identification in a Multi-Output Linear Model

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This paper addresses Pareto set identification (PSI) in multi-output linear bandits: each arm is represented by a feature vector, and its expected output vector is a linear function of the features and an unknown parameter matrix Θ; the goal is to adaptively sample arms to accurately identify all non-dominated arms. We propose the first unified algorithmic framework for PSI based on optimal experimental design, achieving near-optimal sample complexity under both fixed-budget and fixed-confidence settings. Crucially, the complexity depends on the suboptimality gaps of arms on the Pareto frontier—not global gaps—revealing the intrinsic difficulty of PSI. Our theoretical analysis rigorously characterizes the statistical–geometric coupling structure inherent in multi-objective linear bandits. Extensive experiments on synthetic and real-world datasets demonstrate the algorithm’s efficiency and robustness, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

We study the Pareto Set Identification (PSI) problem in a structured multi-output linear bandit model. In this setting, each arm is associated a feature vector belonging to $mathbb{R}^h$, and its mean vector in $mathbb{R}^d$ linearly depends on this feature vector through a common unknown matrix $Θin mathbb{R}^{h imes d}$. The goal is to identify the set of non-dominated arms by adaptively collecting samples from the arms. We introduce and analyze the first optimal design-based algorithms for PSI, providing nearly optimal guarantees in both the fixed-budget and the fixed-confidence settings. Notably, we show that the difficulty of these tasks mainly depends on the sub-optimality gaps of $h$ arms only. Our theoretical results are supported by an extensive benchmark on synthetic and real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Identify Pareto set in multi-output linear bandit model

Adaptively sample arms to find non-dominated ones

Analyze optimal algorithms for fixed-budget and confidence settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal design-based algorithms for PSI

Linear bandit model with feature vectors

Focus on sub-optimality gaps of h arms

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits