Asymmetric Graph Error Control With Low Complexity in Causal Bandits

📅 2024-08-20

🏛️ IEEE Transactions on Signal Processing

📈 Citations: 4

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper studies the causal Bandit problem under unknown causal graphs: maximizing long-term reward via sequential interventions when both the causal topology and intervention distributions are unknown and potentially non-stationary. We propose a novel online learning framework comprising: (1) an asymmetric graph error control mechanism that separately bounds false positives and false negatives; (2) a joint subgraph-level causal discovery and change detection strategy, substantially reducing sample complexity; and (3) an integrated pipeline combining least-squares weight estimation, problem-specific uncertainty quantification, UCB-style intervention selection, and online abrupt change detection. Evaluated on 100 randomly generated causal Bandit instances, our method reduces average sample complexity by 52% and increases cumulative reward by 85% over state-of-the-art baselines—demonstrating consistent superiority in both stationary and non-stationary environments.

Technology Category

Application Category

📝 Abstract

In this paper, the causal bandit problem is investigated, with the objective of maximizing the long-term reward by selecting an optimal sequence of interventions on nodes in an unknown causal graph. It is assumed that both the causal topology and the distribution of interventions are unknown. First, based on the difference between the two types of graph identification errors (false positives and negatives), a causal graph learning method is proposed. Numerical results suggest that this method has a much lower sample complexity relative to the prior art by learning sub-graphs. However, we note that a sample complexity analysis for the new algorithm has not been undertaken, as of yet. Under the assumption of minimum-mean squared error weight estimation, a new uncertainty bound tailored to the causal bandit problem is derived. This uncertainty bound drives an upper confidence bound-based intervention selection to optimize the reward. Further, we consider a particular instance of non-stationary bandits wherein both the causal topology and interventional distributions can change. Our solution is the design of a sub-graph change detection mechanism that requires a modest number of samples. Numerical results compare the new methodology to existing schemes and show a substantial performance improvement in stationary and non-stationary settings. Averaged over 100 randomly generated causal bandits, the proposed scheme takes significantly fewer samples to learn the causal structure and achieves a reward gain of 85% compared to existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Optimizing long-term reward via causal graph interventions

Reducing sample complexity in causal graph learning

Handling non-stationary causal topology and distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric graph error control for causal learning

Sub-graph change detection for non-stationary settings

Uncertainty bound-driven intervention selection optimization

🔎 Similar Papers

No similar papers found.

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

Senior AI Engineer - Monetization Platform

Yahoo

United States of America

Research Engineer, Monetization AI