๐ค AI Summary
This paper addresses strategic policy learning in multi-agent online learning under dual constraints of information asymmetry and knowledge transferability, where the core challenge is to identify confounders in non-i.i.d. action sequences and enable cross-environment policy transfer. To this end, we propose the first unified framework that jointly models information asymmetry and causal transfer within a strategic interaction setting, integrating online reinforcement learning, causal inference, and game theory. Our approach yields an ฮต-optimal policy learning algorithm with a tight sample complexity bound of O(1/ฮตยฒ). Unlike existing methods reliant on i.i.d. assumptions or static environments, our algorithm explicitly handles non-stationarity and strategic interdependence, thereby significantly improving learning efficiency and generalization capability in dynamic, competitive multi-agent settings.
๐ Abstract
Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $epsilon$-optimal policy with a tight sample complexity of $O(1/epsilon^2)$.