Online Scalarization in Vector-Valued Games

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the challenge of dynamically selecting linear scalarization weights to steer convergence toward a preferred equilibrium in multi-player vector-valued repeated games. The authors propose a two-timescale bilevel learning framework: an outer slow learner adaptively selects scalarization weights from a finite candidate set, while an inner fast bandit learner performs action selection based on the current weight. Innovatively treating the scalarization weights as online decision variables, the framework jointly optimizes both levels using an online mirror descent algorithm augmented with stabilizing importance weighting. Empirical results demonstrate that the proposed method increases the probability of converging to a preferred equilibrium from approximately 50% to 80%, while guaranteeing a sublinear regret bound.

📝 Abstract

We study repeated multi-player vector-valued games in which a player observes a payoff vector each round and evaluates outcomes through linear scalarizations of those vectors. Different from most prior works, the choice of scalarization is treated as an online decision variable rather than a fixed modeling decision. We propose a bi-level learning framework in which an outer learner chooses a scalarization from a finite candidate class on a slow timescale, while a faster inner bandit no-regret learner selects actions using the scalar feedback induced by the chosen scalarization. Performance of this approach is defined with respect to a certain true weight vector, and the deployed scalarizations act as control signals that shape the induced payoff trajectory. We provide implementable algorithms based on bandit online mirror descent with stabilized importance weighting, and we derive finite-time performance guarantees in the form of sublinear regret bounds. Experiments on a vector-valued extension of a canonical game show that convergence to the preferred equilibrium rises from roughly $50\%$ under non-adaptive scalarization to about $80\%$ under our proposed method.

Problem

Research questions and friction points this paper is trying to address.

vector-valued games

online scalarization

multi-player games

linear scalarization

dynamic weight selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

online scalarization

vector-valued games

bi-level learning