Graph Learning is Suboptimal in Causal Bandits

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This paper studies regret minimization in causal bandits under unknown causal structure but assuming causal sufficiency. Contrary to existing approaches—which rely either on parent-node identification or joint causal graph learning—we first reveal a fundamental tension: accurate identification of the reward variable’s parents is inherently incompatible with regret minimization, and full causal graph recovery is unnecessary. To address this, we propose a novel algorithm that bypasses both causal graph reconstruction and parent-set estimation, instead directly exploiting the combinatorial structure of the action space for decision-making. We derive a tight information-theoretic regret lower bound and prove that our algorithm achieves near-optimal regret in both settings—when the number of parents is known or unknown. Empirical evaluations across diverse environments demonstrate substantial improvements over state-of-the-art baselines. Our work establishes a new paradigm for causal reinforcement learning that decouples regret minimization from causal discovery.

Technology Category

Application Category

📝 Abstract

We study regret minimization in causal bandits under causal sufficiency where the underlying causal structure is not known to the agent. Previous work has focused on identifying the reward's parents and then applying classic bandit methods to them, or jointly learning the parents while minimizing regret. We investigate whether such strategies are optimal. Somewhat counterintuitively, our results show that learning the parent set is suboptimal. We do so by proving that there exist instances where regret minimization and parent identification are fundamentally conflicting objectives. We further analyze both the known and unknown parent set size regimes, establish novel regret lower bounds that capture the combinatorial structure of the action space. Building on these insights, we propose nearly optimal algorithms that bypass graph and parent recovery, demonstrating that parent identification is indeed unnecessary for regret minimization. Experiments confirm that there exists a large performance gap between our method and existing baselines in various environments.

Problem

Research questions and friction points this paper is trying to address.

Regret minimization conflicts with causal parent identification

Existing graph learning approaches prove suboptimal for causal bandits

Novel algorithms bypass parent recovery for near-optimal performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bypassing graph recovery for regret minimization

Proving parent identification conflicts with optimization

Establishing novel combinatorial regret lower bounds

🔎 Similar Papers

Asymmetric Graph Error Control With Low Complexity in Causal Bandits

2024-08-20IEEE Transactions on Signal ProcessingCitations: 4

Bosch Group

Renningen, BW, DE

2026 Fall Applied Science Internship - Reinforcement Learning & Optimization (Machine Learning) - United States, PhD Student Science Recruiting

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

AI Research Scientist, CoreML - Monetization AI