On Transportability for Structural Causal Bandits

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper addresses causal knowledge transfer across heterogeneous environments—observational and experimental—in structural causal bandits (SCBs). To tackle low learning efficiency arising from scarce data in deployment environments, we introduce transferability into the SCB framework for the first time. Our method integrates multi-source causal graphs, counterfactual reasoning, and multi-environment invariance constraints to enable robust policy learning. Leveraging prior causal knowledge, it prunes the action space and refines reward estimation. We theoretically establish a sublinear regret bound that explicitly depends on the quantity of available prior information. Empirical evaluations demonstrate that our algorithm significantly accelerates policy convergence and improves decision-making performance over conventional online bandit methods, validating the effectiveness of causal transfer in enhancing bandit learning.

Technology Category

Application Category

📝 Abstract

Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.

Problem

Research questions and friction points this paper is trying to address.

Transferring causal knowledge across datasets from different environments

Exploiting structural invariances to improve online learning efficiency

Developing bandit algorithms with transportability for reduced regret

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages causal structure for action optimization

Transfers priors across heterogeneous environments

Exploits invariances to improve learning efficiency

🔎 Similar Papers

Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits