🤖 AI Summary
Existing causal discovery methods struggle to simultaneously achieve strong empirical performance and rigorous theoretical guarantees: practical algorithms (e.g., GES, GraN-DAG) lack finite-sample statistical guarantees, while theoretically grounded approaches suffer from poor scalability. This paper proposes the first game-theoretic adversarial reinforcement learning framework for causal discovery, wherein a DDQN agent dynamically competes against strong baselines (e.g., GES), always initializing from the baseline’s current solution. It pioneers the use of adversarial RL in this domain and provides three provable guarantees: (i) the learned DAG is never worse than the baseline, (ii) convergence is accelerated, and (iii) it converges to the optimal DAG with high probability. Empirical validation confirms the theoretical error bound decreases with sample size on 30-node synthetic data; on real-world benchmarks—Sachs, Asia, Hepar2 (70 nodes), Dream (100 nodes), and Andes (220 nodes)—our method consistently outperforms state-of-the-art approaches.
📝 Abstract
Causal discovery remains a central challenge in machine learning, yet existing methods face a fundamental gap: algorithms like GES and GraN-DAG achieve strong empirical performance but lack finite-sample guarantees, while theoretically principled approaches fail to scale. We close this gap by introducing a game-theoretic reinforcement learning framework for causal discovery, where a DDQN agent directly competes against a strong baseline (GES or GraN-DAG), always warm-starting from the opponent's solution. This design yields three provable guarantees: the learned graph is never worse than the opponent, warm-starting strictly accelerates convergence, and most importantly, with high probability the algorithm selects the true best candidate graph. To the best of our knowledge, our result makes a first-of-its-kind progress in explaining such finite-sample guarantees in causal discovery: on synthetic SEMs (30 nodes), the observed error probability decays with n, tightly matching theory. On real-world benchmarks including Sachs, Asia, Alarm, Child, Hepar2, Dream, and Andes, our method consistently improves upon GES and GraN-DAG while remaining theoretically safe. Remarkably, it scales to large graphs such as Hepar2 (70 nodes), Dream (100 nodes), and Andes (220 nodes). Together, these results establish a new class of RL-based causal discovery algorithms that are simultaneously provably consistent, sample-efficient, and practically scalable, marking a decisive step toward unifying empirical performance with rigorous finite-sample theory.