The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability

๐Ÿ“… 2025-06-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses strategic policy learning in multi-agent online learning under dual constraints of information asymmetry and knowledge transferability, where the core challenge is to identify confounders in non-i.i.d. action sequences and enable cross-environment policy transfer. To this end, we propose the first unified framework that jointly models information asymmetry and causal transfer within a strategic interaction setting, integrating online reinforcement learning, causal inference, and game theory. Our approach yields an ฮต-optimal policy learning algorithm with a tight sample complexity bound of O(1/ฮตยฒ). Unlike existing methods reliant on i.i.d. assumptions or static environments, our algorithm explicitly handles non-stationarity and strategic interdependence, thereby significantly improving learning efficiency and generalization capability in dynamic, competitive multi-agent settings.

Technology Category

Application Category

๐Ÿ“ Abstract
Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $epsilon$-optimal policy with a tight sample complexity of $O(1/epsilon^2)$.
Problem

Research questions and friction points this paper is trying to address.

Addresses online strategic decision-making with information asymmetry
Explores knowledge transportability in non-i.i.d. learning settings
Develops sample-efficient algorithm for ฮต-optimal policy learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-i.i.d. actions for confounder learning
Sample-efficient algorithm for system dynamics
Tight sample complexity O(1/ฮตยฒ) policy
๐Ÿ”Ž Similar Papers
No similar papers found.
Jiachen Hu
Jiachen Hu
Peking University
reinforcement learning
Rui Ai
Rui Ai
Massachusetts Institute of Technology
reinforcement learninggame theory
Han Zhong
Han Zhong
Peking University
Machine Learning
X
Xiaoyu Chen
National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
L
Liwei Wang
Center for Data Science, Peking University, National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
Zhaoran Wang
Zhaoran Wang
Associate Professor at Northwestern University
Deep Reinforcement LearningData-Driven Decision-MakingOptimization Under Uncertainty
Zhuoran Yang
Zhuoran Yang
Yale University
machine learningoptimizationreinforcement learningstatistics