Online Scalarization in Vector-Valued Games

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the challenge of dynamically selecting linear scalarization weights to steer convergence toward a preferred equilibrium in multi-player vector-valued repeated games. The authors propose a two-timescale bilevel learning framework: an outer slow learner adaptively selects scalarization weights from a finite candidate set, while an inner fast bandit learner performs action selection based on the current weight. Innovatively treating the scalarization weights as online decision variables, the framework jointly optimizes both levels using an online mirror descent algorithm augmented with stabilizing importance weighting. Empirical results demonstrate that the proposed method increases the probability of converging to a preferred equilibrium from approximately 50% to 80%, while guaranteeing a sublinear regret bound.
📝 Abstract
We study repeated multi-player vector-valued games in which a player observes a payoff vector each round and evaluates outcomes through linear scalarizations of those vectors. Different from most prior works, the choice of scalarization is treated as an online decision variable rather than a fixed modeling decision. We propose a bi-level learning framework in which an outer learner chooses a scalarization from a finite candidate class on a slow timescale, while a faster inner bandit no-regret learner selects actions using the scalar feedback induced by the chosen scalarization. Performance of this approach is defined with respect to a certain true weight vector, and the deployed scalarizations act as control signals that shape the induced payoff trajectory. We provide implementable algorithms based on bandit online mirror descent with stabilized importance weighting, and we derive finite-time performance guarantees in the form of sublinear regret bounds. Experiments on a vector-valued extension of a canonical game show that convergence to the preferred equilibrium rises from roughly $50\%$ under non-adaptive scalarization to about $80\%$ under our proposed method.
Problem

Research questions and friction points this paper is trying to address.

vector-valued games
online scalarization
multi-player games
linear scalarization
dynamic weight selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

online scalarization
vector-valued games
bi-level learning
bandit online mirror descent
sublinear regret
🔎 Similar Papers
No similar papers found.