Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This paper studies reward manipulation by a leader toward an unknown linear-utility follower in repeated multi-objective Stackelberg games: the follower’s preference weights are initially unknown and must be inferred online via sequential interactions, requiring the leader to balance preference learning against immediate utility maximization. We propose novel manipulation strategies based on expected utility (EU) and long-term expected utility (longEU), which achieve optimal preference steering without explicit negotiation or prior knowledge of preferences. We theoretically prove that longEU converges to the global optimum under infinite repetition. Our approach integrates linear utility modeling, Bayesian preference inference, and incentive-aware decision-making. Empirical evaluation across multiple benchmark environments demonstrates significant improvements in the leader’s cumulative utility and consistent Pareto improvements, validating the method’s effectiveness, robustness, and practicality.

Technology Category

Application Category

📝 Abstract

We study payoff manipulation in repeated multi-objective Stackelberg games, where a leader may strategically influence a follower's deterministic best response, e.g., by offering a share of their own payoff. We assume that the follower's utility function, representing preferences over multiple objectives, is unknown but linear, and its weight parameter must be inferred through interaction. This introduces a sequential decision-making challenge for the leader, who must balance preference elicitation with immediate utility maximisation. We formalise this problem and propose manipulation policies based on expected utility (EU) and long-term expected utility (longEU), which guide the leader in selecting actions and offering incentives that trade off short-term gains with long-term impact. We prove that under infinite repeated interactions, longEU converges to the optimal manipulation. Empirical results across benchmark environments demonstrate that our approach improves cumulative leader utility while promoting mutually beneficial outcomes, all without requiring explicit negotiation or prior knowledge of the follower's utility function.

Problem

Research questions and friction points this paper is trying to address.

Leader manipulates follower's payoff in repeated games

Infer unknown linear follower utility through interaction

Balance preference elicitation with immediate utility maximization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Payoff manipulation in repeated Stackelberg games

Linear unknown utility function parameter inference

Long-term expected utility policy convergence

🔎 Similar Papers

No similar papers found.