Learning to Play Multi-Follower Bayesian Stackelberg Games

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This paper studies online learning for the leader in multi-follower Bayesian Stackelberg games: over $T$ rounds, the private types of $n$ followers are drawn from an unknown distribution, and the leader learns solely from type feedback or action feedback, aiming to minimize cumulative regret. We establish the first online learning framework for multi-follower Bayesian Stackelberg games and propose a unified algorithmic design paradigm that addresses three core challenges—unknown type distribution, incomplete feedback, and exponential dependence on $n$. Under type feedback, we achieve regret bounds of $Oig(sqrt{min{Llog(nKAT),,nK}cdot T}ig)$ and $Oig(sqrt{min{Llog(nKAT),,K^n}cdot T}ig)$. Under action feedback, we obtain $Oig(min{sqrt{n^L K^L A^{2L} L T log T},, K^n sqrt{T} log T}ig)$ and provide a nearly matching lower bound of $Omegaig(sqrt{min{L,nK},T}ig)$.

Technology Category

Application Category

📝 Abstract

In a multi-follower Bayesian Stackelberg game, a leader plays a mixed strategy over $L$ actions to which $nge 1$ followers, each having one of $K$ possible private types, best respond. The leader's optimal strategy depends on the distribution of the followers' private types. We study an online learning version of this problem: a leader interacts for $T$ rounds with $n$ followers with types sampled from an unknown distribution every round. The leader's goal is to minimize regret, defined as the difference between the cumulative utility of the optimal strategy and that of the actually chosen strategies. We design learning algorithms for the leader under different feedback settings. Under type feedback, where the leader observes the followers' types after each round, we design algorithms that achieve $mathcal Oig(sqrt{min{Llog(nKA T), nK } cdot T} ig)$ regret for independent type distributions and $mathcal Oig(sqrt{min{Llog(nKA T), K^n } cdot T} ig)$ regret for general type distributions. Interestingly, those bounds do not grow with $n$ at a polynomial rate. Under action feedback, where the leader only observes the followers' actions, we design algorithms with $mathcal O( min{sqrt{ n^L K^L A^{2L} L T log T}, K^nsqrt{ T } log T } )$ regret. We also provide a lower bound of $Ω(sqrt{min{L, nK}T})$, almost matching the type-feedback upper bounds.

Problem

Research questions and friction points this paper is trying to address.

Optimizing leader strategies in multi-follower Bayesian Stackelberg games with unknown type distributions

Designing online learning algorithms to minimize leader's regret under different feedback settings

Analyzing regret bounds for type and action feedback scenarios in sequential interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning algorithms for Bayesian Stackelberg games

Regret minimization under type and action feedback

Non-polynomial regret bounds independent of follower count

🔎 Similar Papers

No similar papers found.